The world’s leading publication for data science, AI, and ML professionals.

How Tracking Data Helped Me Consistently Exercise for 2 years

Freely copy the R Code, start tracking and expand your data science portfolio with a personal project while getting fitter.

Paraphrasing James Clear, the daily actions you take provide evidence for the identity you have. Thanks to tracking, I have practically changed my identity. Regular exercise is now an intrinsic part of my life.

This article is for everyone who wishes to start a new personal data science project tracking their exercise. All the code is freely available on Github that you can copy and adapt.

If you are an experienced programmer, you can easily follow along even if you have a different device (you will need to make appropriate adjustments). If you are a beginner with little or no prior experience in programming, you will still be able to implement everything you see using your Fitbit data.


Here is a summary of my 2-year journey from April 1, 2020, to May 31, 2022. The start date is the beginning of the pandemic. This coincides with the time when we welcomed a baby girl into this world. With all the uncertainty and major change at home, I feared that it will take a heavy toll, both personally and professionally. Two years on, and I am content with the progress, thanks to data science and tracking!

2-year summary (Image by Author)
2-year summary (Image by Author)

In total, I undertook 998 sessions that amounted to exercising for 638 hours. The bulk of my exercise is made up of elliptical training. Overall, this amounts to exercising every day, without fail, for 48 minutes over two years.

This data also confirms that (amongst the activity types) running is the most time-efficient method to burn calories (13.7 burnt for every minute). However, overall, cross-trainer is magnitudes better because you are more likely to do it substantially more times and longer.


Fitbit Analysis Deep Dive Walk-Through

If you already own a Fitbit Device, then the first thing you need to do is download all the Fitbit Data. Every country will have its own jurisdictions but typically, by law, you have the right to your data and can request full access. Here is how you can log in to your Fitbit account and request a download of all your data.

First, go to https://accounts.fitbit.com/login and log in with your account details. Click on the Settings gear icon on the top right (shown below).

Click on Settings after logging in to Fitbit (Image by Author)
Click on Settings after logging in to Fitbit (Image by Author)

Once you are in ‘Settings’, click on "Data Export" and choose the "Request Data" which will then download all your data held by Fitbit (shown below).

Click on "Data Export" in the Fitbit dashboard and "Request Data" (Image by Author)
Click on "Data Export" in the Fitbit dashboard and "Request Data" (Image by Author)

As instructed, it will trigger an automated email from Fitbit (screenshot below).

Confirm the export request in the email from Fitbit (Image by Author)
Confirm the export request in the email from Fitbit (Image by Author)

Once you verify that you indeed initiated the request, the data generation process will begin.

Fitbit data export processing (Image by Author)
Fitbit data export processing (Image by Author)

Data Structure of Fitbit

The data downloaded from Fitbit will come as a single zip file. Once you unzip it, it will have 12 different folders. You can find more about the contents of each folder in the associated Readme file found in each folder. Depending on the use of the device, many of the folders may just be empty (this is perfectly fine!).

For the purposes of this article, we will focus on selected files in the folder named "Physical Activity". This folder contains pretty much all data concerned with physical activity including data on:

-Active Zone Minutes (one file for each month in CSV format)

-Exercise types and duration (one file for every 100 sessions in JSON format)

-Heart rate (one file for each day in JSON format)

-VO2 Max data (stored in JSON format)

  • Duration in different heart rate zones (JSON format, one file for each day)

As you can notice, most data is stored in JSON format.

JSON Lite Library and JSON Data Format

JavaScript Object Notation (abbreviated as JSON) is an open standard file format for storing structured data. The format is human-readable that consists of attribute-value pairs and arrays. JSON file formats are stored with an extension ".json". It is very common for devices now to store data in JSON. This format is language-agnostic and most software you use will likely contain existing code libraries to generate and read in (parse) JSON files.

Here is an example of the JSON file (run_vo2_max-date) that you will find in your physical activity folder.

JSON format example (Image by Author)
JSON format example (Image by Author)

Before we can analyze the Fitbit data, we will need to read it into R. We will use the "jsonlite" library available freely for R. If you haven’t previously used this library, then install it first from within RStudio. Once installed, load the library.

The three other libraries that we will use are lubridate, dplyr and ggplot2. dplyr and ggplot2 are part of the tidyverse ecosystem and they automatically get loaded once you load the ‘tidyverse’.

rm(list=ls())
library(jsonlite)
library(tidyverse)
library(lubridate)

dplyr is used for data wrangling, ggplot2 is used for Data Visualization, and lubridate is a convenient library for manipulating dates. You will soon see actual examples to help track exercise data!


Import Data and Store in a DataFrame

Let’s now go through the actual code. All this code is available for you to download from GitHub.

We will be looking at the Exercise files located in the "Physical Activity" folder. Fitbit automatically stores 100 sessions in each exercise file. You can visually look at the filenames to identify the total exercise files you have. In my case, I have had 1400+ sessions in total.

folder_path <- 'D:/Personal/My-2Year-Review/OxguruSomeone/'  
folder_name <- 'Physical Activity/'
file_name_part_start <- 'exercise-'
file_name_part_end <- seq (from = 0, to=1400, by=100)
file_extension <-'.json'

The following for loop goes through the total number of files and then loads them one by one. In the end, you get a single dataframe with all the exercise sessions.

for (k_part in file_name_part_end){
  file_selected <- paste0(folder_path,folder_name,file_name_part_start,k_part,file_extension)
  data_loaded<-fromJSON(file_selected)
  if(k_part == file_name_part_end[1])
  {
    data_loaded<-data_loaded%>%
      select(all_of(sel_names))
    data_all <- data_loaded
    rm(data_loaded)
    }
  else
  {
    data_loaded<-data_loaded%>%
      select(all_of(sel_names))
    data_all<-rbind(data_all,data_loaded)
    rm(data_loaded)
  }
}

Generate a Summary Table

We can now use the dataframe to get a high-level summary of the exercise sessions. This could include things like the total number of sessions, total duration, and calorie burning rate for each exercise type.

time_start=ymd('2020-04-1')
time_end=ymd('2022-03-31')
activity_type=c('Elliptical','Run','Walk','Outdoor Bike')
data_selected<-data_all%>%
  filter(activityName %in% activity_type)%>%
  mutate(startTime=mdy_hms(startTime))%>%
  filter(startTime>=time_start)%>%
  filter(startTime<=time_end)%>%
  arrange(startTime)
data_summary<-data_selected%>%
  group_by(activityName)%>%
  mutate(total_duration=sum(duration)/(1000*60*60))%>%
  mutate(total_sessions=n())%>%
  mutate(longest_session=max(duration)/(1000*60))%>%
  mutate(shortest_session=min(duration)/(1000*60))%>%
  mutate(total_calories=sum(calories))%>%
  mutate(mean_heartRate=mean(averageHeartRate))%>%
  select(-averageHeartRate,-calories,-startTime)%>%
  filter(row_number()==1)

Stacked Bar Charts, One Bar Per Day

Let’s plot a stacked bar chart that will show the total daily duration of exercise during the entire two years. Such a graph helps you eyeball your exercise routine. You can readily identify the very active periods, and the quiet periods when looking at such a graph.

To generate such a graph, I have first used the dplyr functions in conjunction with lubridate to create additional columns to the existing dataframe (data_all). As there could be multiple sessions in a single day, we need to sum the duration of all exercises on a given day. We achieve this by creating a new column (rounded to the nearest day "unit") and then using the group_by on this newly created column (see relevant snippet below).

data_activity<-data_all%>%
  filter(activityName %in% activity_type)%>%
  mutate(startTime=mdy_hms(startTime))%>%
  filter(startTime>=time_start &amp; startTime<=time_end)%>%
  mutate(startTime_round=round(startTime,unit="day"))%>%
  group_by(startTime_round,activityName)%>%
  mutate(duration_per_day=sum(duration/(1000*60)))%>%
  mutate(total_calories=sum(calories))%>%
  mutate(mean_heartRate=mean(averageHeartRate))%>%
  filter(row_number()==1)

And here is the stacked bar chart showing the daily exercise duration.

Stacked Bar showing daily exercise duration during the first year of the pandemic (Image by Author)
Stacked Bar showing daily exercise duration during the first year of the pandemic (Image by Author)

You can, of course, get similar plots for calories, and heart rate.


Stacked Bar Charts, One Per Month

The daily stacked bar charts are useful to get an idea of your maximum daily capacity. However, it is also worthwhile to get a more aggregated measure to better understand our capacities on a longer time horizon.

The following charts show the monthly total duration over the 2 years. I got a cross-trainer in June 2020 and the next month, I exercised quite a bit crossing 60 hours in a month. It was obviously not sustainable and I then gradually decreased my monthly duration.

The total duration of exercise every month during the first 2 years (Image by Author)
The total duration of exercise every month during the first 2 years (Image by Author)

Besides the total duration every month, we can also compute the daily average (shown below). From the figure, I can see that I have exceeded the daily average of 60 minutes for 8 months in the 2-year period, and I have almost always exceeded the daily average of 30 minutes.

Mean duration of exercise every month during the first 2 years (Image by Author)
Mean duration of exercise every month during the first 2 years (Image by Author)

Polar Plot to Identify Time of Exercise

To get a better sense of the times when you are more likely to exercise, you can plot a histogram that counts the total number of sessions in each hour. It wouldn’t make much sense if 2 short sessions (say 10 minutes each) were shown as a high bar compared to a single session of 60 minutes. To counter this, you can easily use color/shade to categorize sessions by length.

The following shows a histogram for one 10-week period where the sessions have been categorized by duration. The histogram is shown in polar coordinates which I find more intuitive.

The histogram on polar coordinates to identify the favorite hours during the day when I exercised a lot! (Image by Author)
The histogram on polar coordinates to identify the favorite hours during the day when I exercised a lot! (Image by Author)

From the figure, I can see that the majority of my sessions happen in the early mornings or during the night. The above graph covers the June 2020 period where I averaged ~2 hours/day. This is where I also had a decent number of sessions at around midnight. This was clearly not sustainable and I later defaulted to something more sustainable (shown below). Most of my sessions are in the mornings.

The histogram on polar coordinates to identify the favorite hours during the day when I exercised less! (Image by Author)
The histogram on polar coordinates to identify the favorite hours during the day when I exercised less! (Image by Author)

Correlation Plots

Here are some additional plots to help answer more questions. The plot below investigates the correlation between duration and average heart rate for each session. The lack of correlation suggests that you can sustain the same intensity for a long period, up to ~60 minutes.

Duration vs Intensity (Image by Author)
Duration vs Intensity (Image by Author)

The plot below investigates the correlation between calories and duration. There is a strong correlation. Perhaps not surprisingly, the longer you exercise, the more calories you will burn.

Duration vs Calories (Image by Author)
Duration vs Calories (Image by Author)

The following figure plots the VO2 max over the 2-year period. VO2 max is a proxy measure of your cardiovascular Fitness that measures how well our bodies consume oxygen when we are exercising hard. The Fitbit device estimates this measure based on the relationship between pace and heart rate. As this measure is only estimated if you run outdoors, I have long periods with no data points. Nevertheless, your fitness score still improves even if you are undertaking a different activity, other than running. The graph below provides clear evidence of this.

VO2 Max over time (Image by Author)
VO2 Max over time (Image by Author)

Final Thoughts

The analysis presented in this article is not exhaustive. There is more data that I have not presented here (such as the minute-by-minute heart rate and sleep tracking data).

You actually don’t need to know all the possible analysis types you can undertake a priori. You can begin tracking your own data and let your own circumstances and curiosity drive and shape this project.

In 2009 (while still in my early 20s), my housemate went out for a run. I, full on motivation to change my life, joined him and made a decision to start regular exercise. 5 minutes later, I was heavily panting and I just stopped and concluded that regular, sustained exercise is impossible for me. Back then, I was ignorant and I had no self-awareness of my capacity. 13 years later, with well over a thousand+ sessions tracked, I am a different person.

Once you begin tracking your sessions, you will, over time, have an acute awareness of your capacity. You will know how much you can exercise in a day, in a week, in a month, and in a year. You will also know what times of the day you are able to exercise that fit with your routine, and the type of exercise you prefer. You will better be able to set realistic, achievable goals to sustain or grow.

Join Medium with my referral link – Ahmar Shah, PhD (Oxford)


Related Articles