Analyze Your Health and Fitness Data using Julia

An easy data science project to spice up your coding weekend

Published in

Towards Data Science

7 min readOct 3, 2021

I wear a watch almost everyday. My habitual partner has been a Samsung Gear S3 Frontier, which I bought way back in 2018 for my birthday. Apart from the usual features expected from a watch, it has a bunch of sensors to record various activities such as steps, distance, climbed floors and heart rate (via a photoplethysmogram). The newer Galaxy line of watches also feature sensors for recording blood oxygen and EKG. As a tech nerd, this naturally makes me excited!

The data recorded by these sensors is passed on to the Samsung Health App, which already does a good job of summarizing them into nice visuals. However, the programmer in me has always been itching to do more with it. Fortunately, Samsung allows us to download the raw data (.csv files) directly from the app. If you are a Samsung watch/phone user, you can download your health data by following the instructions as described in this article. I guess there are ways to do this for Fitbit and Garmin users as well. If you find something useful, do let me know in the comments.

In this guide, we will make use of various packages from the Julia ecosystem. I did this exercise using Pluto.jl, which allows us to create interactive code notebooks (available here). You can also choose your own editor such as VS Code.

Setting up Pluto

In case you are new to Julia, you will first need to install the binaries suitable for your system from here. One installed, open the Julia REPL. Press ‘]’ to enter the Pkg (Julia’s built-in package manager) prompt. Then type add Pluto. This will download the Pluto package and also compile it for you. Press backspace to come back to the Julia prompt. Then type using Pluto followed by Pluto.run(). A new browser window showing the Pluto homepage should open. You can go ahead and create a new notebook.

If you are familiar with Jupyter, you already know how to use Pluto. However, keep in mind the following peculiarities:

Pluto is a reactive notebook, which means that cells are linked to each other. When you update a cell, all other cells dependent on it also get updated. That also means you cannot use the same variable names at two places, unless their scope is local (e.g. in functions).
When writing multiple lines in a code cell, you need to wrap then in a begin-end block. You will be given an option to do this automatically when executing (shift + enter) such a block.

To import all the relevant packages in your working environment, execute the following:

using PlutoUI, DataFrames, CSV, Query, VegaLite, Dates, HTTP, Statistics

Obtaining input data

We will read the data directly from my GitHub repository using CSV.jl, and store them in the form of DataFrames. For visualization, we will make use of the excellent VegaLite.jl package. Data from the Samsung Health app is available in the form of various csv files, however, we will only be using the following three:

com.samsung.shealth.tracker.pedometer_day_summary.<timestamp>.csv
com.samsung.shealth.tracker.heart_rate.<timestamp>.csv
com.samsung.health.floors_climbed.<timestamp>.csv

Filenames are rather self-explanatory. We can read the data directly into a DataFrame as shown below. We set header = 2, so that the second row is used to name the columns.

Read the CSV files into the respective DataFrames

Cleaning and some organization

Let’s explore what our DataFrame actually contains.

You can also try the following:

The create_time column represents the time when the data entry was recorded, and is of type “String”. We need to convert it to a “DateTime” object, which will later allow for sorting and easier plotting. Additionally, we can also convert the distance column into km (from m) and active_time into minutes. Finally, we clean the DataFrame by removing duplicate entries, and sort w.r.t. time.

Actions for cleaning and sorting

We calculate the cumulative distance and add it to a separate column cumul_distance. For later use, it is also handy to classify days as 'weekday' or 'weekend', and add them to a separate day_type column. Similarly for the new day and month columns.

Adding new columns

Filtering based on time

Using the PlutoUI.jl package, we can further enhance the interactive experience by adding buttons, sliders etc. For instance, to bind a DateTime value to our variable start_date, do the following.

DataFrame is filtered based on the time range selected above. @filter is a powerful macro provided by the Query.jl package. We filter out rows for which create_time lies between start_date and end_date as shown below:

Filtering using Query.jl

Now we are ready to start visualizing the data.

Daily steps

Our filtered DataFrame df_pedometer_filter can be passed directly to @vlplot macro provided by the VegaLite.jl package. Rest of the arguments are specific to the type of plot. Check out the VegaLite.jl tutorial for more information.

Plot daily step count with line gradient

It is clear that I did way less number of steps in 2020. This is likely due to the lockdowns imposed during the Corona outbreak. Glad to see that I have picked up the pace in 2021. Same data can also be visualized as a stacked histogram.

Plot a stacked histogram of daily step count

The default target for each day is set to 6000 steps. I always try to at least reach there, hence the peak is around that value for all three years. For 2020, there are a lot of days where my activity was low as expected.

Daily distance

I like to go for walks on a daily basis. In addition to the step count, it would also be interesting to see how much distance I usually cover. Setting the color scale to distance column in our DataFrame, renders the bars with a gradient that is proportional to the size of each data point. Looks quite cool!

Plot daily distance as bars with color gradient

Cumulative distance

It’s fun to also check how many km have I covered so far. I guess reaching up to 10,000 km would be a nice target.

Plot cumulative distance as an area

Active time

This is the time spent during any activity (walking, running etc.) as detected by the watch. If you remember, we had previously added a day_type column based on whether it’s a weekday or weekend. Now, we can make use of that to group our active_time accordingly.

Plot active time grouped using day_type

We can also do a per day breakdown of active time simply by changing the color = :day parameter in the argument list to @vlplot macro.

It seems I am quite active on Tuesdays and Wednesdays, and least active on Saturdays. Does that make sense?

Correlation between number of steps and calories

Plot a 2D histogram for step count vs calories

You can create a slider in Pluto (needs the PlutoUI.jl package), and bind its value to the select_year variable as shown below:

As expected, number of steps and total calories consumed have a direct correlation. This 2D histogram scatterplot also shows markers with size proportional to the total number of counts. Fewer data points exist for higher step counts. I should try to be more active this year.

Heat map of step count vs active time

Plot a heat map for step count vs calories

Visualizing heart rate data

Heart rate data can also be cleaned up using a similar strategy as shown before.

Plot heart rate data using circular markers and size proportional to the value

Heart rate is measured by my watch at intervals of 10 minutes. I wear it almost everyday. That means most of the data points are collected while I am sitting (mostly relaxed) at my desk for work. Let’s see how the distribution looks like.

Plot heart rate distribution with color scale set to magnitude of the values

Most of the data appears to be clustered around the resting heart rate range of 60–100 beats per minute (bpm) with a mean around 78–79 bpm. That’s a relief! The ones which are quite high were likely measured during a running session.

Number of floors climbed

Plot number of floors climbed

Nothing too exciting here, except for a huge spike in Nov, 2019. I was wearing this watch during a short hike in the city of Nainital, India. An elevation change of 9 feet is recorded as one floor climb. So, 65 floors indicates that I must have climbed 585 feet ~ 178 m during that time. Phew!

Conclusion

The Julia ecosystem is rapidly evolving with numerous amazing plotting packages, and VegaLite.jl happens to be one of them. The elegant grammar of graphics style and tight integration with DataFrames makes it an ideal choice for any kind of Data Science/Analytics project. I hope you enjoyed going through this guide. Full code (Pluto notebook) can be found here. Thank you for your time! In case you want to connect, here’s my LinkedIn.

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.