Analyze Your Health and Fitness Data using Julia
An easy data science project to spice up your coding weekend
I wear a watch almost everyday. My habitual partner has been a Samsung Gear S3 Frontier, which I bought way back in 2018 for my birthday. Apart from the usual features expected from a watch, it has a bunch of sensors to record various activities such as steps, distance, climbed floors and heart rate (via a photoplethysmogram). The newer Galaxy line of watches also feature sensors for recording blood oxygen and EKG. As a tech nerd, this naturally makes me excited!
The data recorded by these sensors is passed on to the Samsung Health App, which already does a good job of summarizing them into nice visuals. However, the programmer in me has always been itching to do more with it. Fortunately, Samsung allows us to download the raw data (.csv files) directly from the app. If you are a Samsung watch/phone user, you can download your health data by following the instructions as described in this article. I guess there are ways to do this for Fitbit and Garmin users as well. If you find something useful, do let me know in the comments.
In this guide, we will make use of various packages from the Julia ecosystem. I did this exercise using Pluto.jl, which allows us to create interactive code notebooks (available here). You can also choose your own editor such as VS Code.
Setting up Pluto
In case you are new to Julia, you will first need to install the binaries suitable for your system from here. One installed, open the Julia REPL. Press ‘]’ to enter the Pkg (Julia’s built-in package manager) prompt. Then type add Pluto
. This will download the Pluto package and also compile it for you. Press backspace to come back to the Julia prompt. Then type using Pluto
followed by Pluto.run()
. A new browser window showing the Pluto homepage should open. You can go ahead and create a new notebook.
If you are familiar with Jupyter, you already know how to use Pluto. However, keep in mind the following peculiarities:
- Pluto is a reactive notebook, which means that cells are linked to each other. When you update a cell, all other cells dependent on it also get updated. That also means you cannot use the same variable names at two places, unless their scope is local (e.g. in functions).
- When writing multiple lines in a code cell, you need to wrap then in a begin-end block. You will be given an option to do this automatically when executing (shift + enter) such a block.
To import all the relevant packages in your working environment, execute the following:
using PlutoUI, DataFrames, CSV, Query, VegaLite, Dates, HTTP, Statistics
Obtaining input data
We will read the data directly from my GitHub repository using CSV.jl, and store them in the form of DataFrames. For visualization, we will make use of the excellent VegaLite.jl package. Data from the Samsung Health app is available in the form of various csv files, however, we will only be using the following three:
- com.samsung.shealth.tracker.pedometer_day_summary.<timestamp>.csv
- com.samsung.shealth.tracker.heart_rate.<timestamp>.csv
- com.samsung.health.floors_climbed.<timestamp>.csv
Filenames are rather self-explanatory. We can read the data directly into a DataFrame as shown below. We set header = 2, so that the second row is used to name the columns.
Cleaning and some organization
Let’s explore what our DataFrame actually contains.
You can also try the following:
The create_time column represents the time when the data entry was recorded, and is of type “String”. We need to convert it to a “DateTime” object, which will later allow for sorting and easier plotting. Additionally, we can also convert the distance column into km (from m) and active_time into minutes. Finally, we clean the DataFrame by removing duplicate entries, and sort w.r.t. time.
We calculate the cumulative distance and add it to a separate column cumul_distance. For later use, it is also handy to classify days as 'weekday' or 'weekend', and add them to a separate day_type column. Similarly for the new day and month columns.
Filtering based on time
Using the PlutoUI.jl package, we can further enhance the interactive experience by adding buttons, sliders etc. For instance, to bind a DateTime value to our variable start_date, do the following.
DataFrame is filtered based on the time range selected above. @filter is a powerful macro provided by the Query.jl package. We filter out rows for which create_time lies between start_date and end_date as shown below:
Now we are ready to start visualizing the data.
Daily steps
Our filtered DataFrame df_pedometer_filter can be passed directly to @vlplot macro provided by the VegaLite.jl package. Rest of the arguments are specific to the type of plot. Check out the VegaLite.jl tutorial for more information.
It is clear that I did way less number of steps in 2020. This is likely due to the lockdowns imposed during the Corona outbreak. Glad to see that I have picked up the pace in 2021. Same data can also be visualized as a stacked histogram.
The default target for each day is set to 6000 steps. I always try to at least reach there, hence the peak is around that value for all three years. For 2020, there are a lot of days where my activity was low as expected.
Daily distance
I like to go for walks on a daily basis. In addition to the step count, it would also be interesting to see how much distance I usually cover. Setting the color scale to distance column in our DataFrame, renders the bars with a gradient that is proportional to the size of each data point. Looks quite cool!
Cumulative distance
It’s fun to also check how many km have I covered so far. I guess reaching up to 10,000 km would be a nice target.
Active time
This is the time spent during any activity (walking, running etc.) as detected by the watch. If you remember, we had previously added a day_type column based on whether it’s a weekday or weekend. Now, we can make use of that to group our active_time accordingly.
We can also do a per day breakdown of active time simply by changing the color = :day
parameter in the argument list to @vlplot macro.
It seems I am quite active on Tuesdays and Wednesdays, and least active on Saturdays. Does that make sense?
Correlation between number of steps and calories
You can create a slider in Pluto (needs the PlutoUI.jl package), and bind its value to the select_year variable as shown below:
As expected, number of steps and total calories consumed have a direct correlation. This 2D histogram scatterplot also shows markers with size proportional to the total number of counts. Fewer data points exist for higher step counts. I should try to be more active this year.
Heat map of step count vs active time
Visualizing heart rate data
Heart rate data can also be cleaned up using a similar strategy as shown before.
Heart rate is measured by my watch at intervals of 10 minutes. I wear it almost everyday. That means most of the data points are collected while I am sitting (mostly relaxed) at my desk for work. Let’s see how the distribution looks like.
Most of the data appears to be clustered around the resting heart rate range of 60–100 beats per minute (bpm) with a mean around 78–79 bpm. That’s a relief! The ones which are quite high were likely measured during a running session.
Number of floors climbed
Nothing too exciting here, except for a huge spike in Nov, 2019. I was wearing this watch during a short hike in the city of Nainital, India. An elevation change of 9 feet is recorded as one floor climb. So, 65 floors indicates that I must have climbed 585 feet ~ 178 m during that time. Phew!
Conclusion
The Julia ecosystem is rapidly evolving with numerous amazing plotting packages, and VegaLite.jl happens to be one of them. The elegant grammar of graphics style and tight integration with DataFrames makes it an ideal choice for any kind of Data Science/Analytics project. I hope you enjoyed going through this guide. Full code (Pluto notebook) can be found here. Thank you for your time! In case you want to connect, here’s my LinkedIn.
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.