Hands-on Tutorials

How to Use Data to Gain Insights from Workout Routines

Matt Gray
Towards Data Science
9 min readNov 27, 2020

--

Photo by Jonathan Borba on Unsplash

The digital age provides easy access to bountiful amounts of open sourced data and a wealth of information like never before. This is a potential gold mine for Data Teams to emerge and convert raw data into insights, which adds significant value for an organization.

Unfortunately, the vast majority of data science reads get caught up in and overemphasize the machine learning aspect of Data Science and neglect the core practice of the field — data exploration and manipulation.

As a result, many aspiring Data Scientists enter the industry with knowledge and expectations of machine learning work, but significantly lack in fundamental experience in how to explore and munge data.

To emphasize the importance and value of these processes, this article demonstrates a practical example how critical insights are to the field of Data Science.

Data Insights Begin with a Question to a Problem

For the past couple years I have been (on and off) going to the gym to lift weights, which more recently (due to COVID-19) has resulted in buying enough equipment to have a barebones home gym setup.

Regardless of where I’m working out, one thing has remained consistent — recording my weightlifting numbers to track my progress (call it my love for data). Anyone who exercises knows that phone apps are a dime a dozen, but I find these apps are difficult to integrate into my workout routine while I’m in action, so I’ve got a paper trail. While some would argue having a small workout book comes with disadvantages, it is flexible in the sense of what I can write down and I have full uninhibited access to my own data that I can investigate.

Recently, I’ve found my weightlifting progress is starting to plateau as I’ve noticed that it’s getting harder to increase the weight I am lifting month over month. By recognizing this problem, we can begin to ask questions and explore the data in search for potential solutions.

Data Gathering Process

As mentioned, since I started trying to go to the gym regularly, I’ve been adamant on recording my workouts on template sheets [1] as seen in Fig. 1 below.

Fig. 1 Scanned Image of Workout Sheet (Image By Author)

In order to manipulate the data, these booklets have to be transferred to digital form. While optical character recognition (OCR) could be used to try to automate this step, there are a number of reasons I opted not to do this immediately:

  • A significant portion of my time would be spent finding and integrating a solution
  • Many of my scanned images contain mark-ups as notes for myself and wouldn’t easily transfer to template form
  • I would still have to manually convert some of these files myself to estimate the accuracy of OCR

For these reasons, I opted to manually convert my sheets to Excel as shown in Fig. 2. While this took a few hours of effort, practically speaking this is quicker, easier, and more accurate than the alternative solution of developing OCR.

Fig. 2 Data Entered in Structured Format (Image By Author)

If I decided to adapt my work to develop an app where other like-minded, traditional paper-users could submit scans for their own analysis, this large manually entered dataset forms a decently sized set of usable data to train and estimate accuracy of an OCR model.

Initial Assessment

For easy manipulation and visualization, I loaded the data into Pandas. On first glance, we can interpret how many distinct days per month I have recorded workouts since I first started entering data. Figure 3 reveals that while in 2018 and 2019 my track record was poor (mainly only went to the gym during warm months), the recent shift to working from home and my home gym setup have been extremely motivating. Considering we are mid-November at the time of this writing, my track record for home gym workouts has been consistently just under once every other day since March 2020.

Fig. 3 Number of Days Worked Out Each Month (Image By Author)

In order for any further investigation to be meaningful, we’ll filter out the 2018 and 2019 years going forward as being too incomplete to be beneficial. This step is crucial in data analysis as not all data is beneficial in seeking answers, and failure to recognize this prior to gathering results may have resulted in wrongful interpretation of the data. This does not mean that the data is fully ready for use however, and our next sift will be looking further into the individual records to see how much of the 2020 data is usable.

Identifying and Resolving Data Quality Issues

As the data was handwritten over the course of a year, the names of the activities are, understandably, inconsistent — this was not modified when transferring the data from paper to Excel. While this will be beneficial as future testing data for potentially incorporating OCR , this also means that we’ll need to define a way of identifying which different activity names are referring to the same activity.

Fig. 4 Unique Activity Names in the Workout Data

For example, “Ab Crunch” “Ab Crunches” and “Ab Work” seen in Fig. 4, outline all different activity names that all refer to the same activity. However, “Tricep Curl” and “Tricep Curls” are not the same as “Tricep Dip”. These kinds of nuances make it extremely difficult to attempt to use algorithms such as Levenshtein Distance to resolve. For our purposes, the easiest and most accurate path is to create a mapping dictionary which converts activities to a consistent naming convention.

Having applied a consistent naming scheme to the activities, Fig. 5 reveals which activities I’ve performed the most.

Fig. 5 Most Frequently Performed Workout Activities (Image By Author)

Since March 2020, I have been following the 5/3/1 Workout regime [2], which focuses on using comprehensive activities (activities targeting multiple muscle groups simultaneously). From the graph above, we can see that Bench Press, Shoulder Press, Deadlift, Barbell Row, and Squats are my primary lifts — four of these are comprehensive (all except barbell row). In order to focus solely on the core exercises, we’ll consider only Bench Press, Shoulder Press, Deadlift, and Squats going forward. Filtering to this point gives us a quality data set and will allow us to start trying to establish results.

Analysis of Historical Data and Early Insights

Starting with straightforward measures of progress, Figure 6 reveals the maximum weight pushed and total weight pushed (1) each month for the core lifts considered.

Total Weight Pushed per Set = Number of reps in set * Weight lifted in set (1)

Fig. 6 (a) Maximum Weight per Core Lift Each Month (Image By Author)
Fig. 6 (b) Total Weight Pushed per Core Lift Each Month (Image By Author)

The first thing of note in Fig. 6a is the outlier in maximum squat weight for March 2020. With my gym closing due to COVID-19 and not having a proper squat rack at home until recently, my weight capacity was limited in this area. This impact can also be seen in the total weight pushed in the month of March 2020 as total weight pushed across all lifts steeply drops in converting to my home gym.

Unexpected Results

An unexpected find while exploring the data, however, is that while the maximum weight I lift is generally increasing over time, the total weight pushed has been declining over the past few months. This could suggest that while I am pushing more weight at once, I am significantly reducing the number of reps in each of my sets, which may indicate a lack of strength gain. This isn’t where it ends though, as this observation cannot be taken at face value as total weight pushed per month is directly dependant on how many days/month I work out, and Fig. 3 shows that I have worked out less times in August and September.

This once again emphasizes the importance and need for data exploration and munging — if I had taken the data for face value, I would have missed this and the results wouldn’t be accurate.

Predicting Future Progress

In order to address the variability of workout frequency, we can exploit the details of the 5/3/1 routine. The 5/3/1 BBB routine I follow uses a 4-week cycle, with the Week 3 being the most intense week (and Week 4 a recovery / deload week).

Since I don’t strictly follow the 4 workouts/week schedule and every month is not exactly 4 weeks in duration, it is more natural to plot my progress for every 3rd week in the 5/3/1 cycle. If I filter the data to show the maximum weight lifted for each core lift across 5/3/1 cycles, as outlined in Fig. 7, the progress trend appears smoother to the point where we might be able to forecast future progress.

Fig. 7 Maximum Weight per Core Lift Across 5/3/1 Cycles (Image By Author)

Predicting the Plateau

For the following, we will use Bench Press as an example as the bench press maximum weight lifted seems to follow a smooth exponential decay (2), which we can apply basic regression to estimate a plateau in future progress.

Where x represents the 5/3/1 cycle; and a, b, c are parameters to be optimized using linear regression.

Fig. 8 Bench Press Plateau (Image By Author)

From Fig. 8 we can approximate that if I do not find a way to change this trajectory, my Bench Press will plateau around 135lbs, roughly on my 12th cycle of 5/3/1 — which is roughly a year into the program.

Based on the insights we found earlier in the data, it may be beneficial to do more reps/sets in each workout so the total weight lifted continues to grow at a steady rate. Strength gain is also not entirely limited to workout activity. Significant research suggests that external factors such as eating habits and sleep play a role in building muscle; but without having captured any data myself, the ability to investigate this is outside the scope of this article.

How Does My Progress Fair Against the Competition?

As a means of motivation and participating in the broader community (and extra consideration for analysis), I entered myself into the Garage Gym Competition for the 2020 year [3], which provided the privilege of opening up their 2020 results dataset.

Focusing on just the Bench Press component of the competition, a histogram of male entrants 1 rep maximum (1RM) bench press is grouped into bins of 10 lbs increments in Fig. 9. While my entry at the time of competition in May (green line) is not breaking any world records, the progress shown by comparing my current best 1RM (orange) and my potential at the plateau (red) show a significant improvement on the scale over the course of the tracked approximately 7 month period.

Fig. 9 Histogram and Potential Progress of Garage Gym Competition (Image By Author)

Putting It All Together

A lot of businesses have a wealth of data but lack information, which leaves a lot of value on the table for Data Scientists to provide data driven decisions and insights. In my attempt to demonstrate this, it has been emphasized how crucial the exploration and munging process is when dealing with data.

The data ultimately confirmed my hypothesis that I am beginning to plateau in my weightlifting progress, and surprisingly revealed that although I was lifting heavier weights, the total weight I was pushing over time has been decreasing. We further performed regression on my Bench Press progress to forecast a plateau of lifting 135Lbs on my 12th cycle of 5/3/1. Comparing my bench press results to those seen in the 2020 Garage Gym Competition for male entrants; we have also seen that my progress since beginning to track data has shown significant growth.

Understanding and properly handling the data is the foundation of data science. If you don’t take time time to build the foundation and you engage in a machine learning project, you’ll find yourself unable to interpret the results accurately and see where the issues lie.

By employing quick and easy tactics such as those in this article, it can add significant business value to a company.

References

[1] TrainRite Fitness Journals, www.trainrite.ca

[2] J. Wendler, Beyond 5/3/1: Simple Training for Extraordinary Results (2013)

[3] Garage Gym Competition — Uniting the Community Through Competition, www.garagegymcompetition.com

--

--

PhD in Electrical Engineering, experienced Data Scientist, and long term tinkerer. Consultant for data driven decision making and AI adoption.