The world’s leading publication for data science, AI, and ML professionals.

How to Visualize your Runners’ High with Python and Altair

A Step-by-Step Guide to Retrieve and to Visualize your Running Data with Python and Altair

Photo by Bruno Nascimento on Unsplash; slightly altered by author
Photo by Bruno Nascimento on Unsplash; slightly altered by author

TUTORIAL – PYTHON – RUNNING

1. Introduction

Hi, I am Gregor, a Data Scientist, and a passionate non-competitive runner. I just realized that I started to use a running app on my phone ten years ago. Back then, I just recorded GPS, start and end-times. I had no means to record cadence, heart rate, elevation, and the like. I remember to be a bad runner – slow and easily out of breath. I just finished my Ph.D. and worked way too much at my desk. So I decided to start this journey as a runner.

Ten years is quite a long time and I am wondering what changed in my runs over this time frame. Did I get fitter or faster or calmer? At which level do I train? While most Fitness platforms provide some reports I missed the view over my last 10 years. Especially I am interested in year-on-year comparisons.

In this article, I will show you (1) how to get your running data exported from Garmin connect, (2) how to clean the data using Python Pandas, and (3) how to visualize the data using Altair.

2. Data export from Garmin Connect

First of all, head to Garmin Connect and especially to the "All Activities" page. It is best to filter your running activities by clicking on the running icon in the top-right corner.

Garmin Connect Activity page; image by author
Garmin Connect Activity page; image by author

Before you can hit the export button, it is important to scroll down the list of your running activities. Only those activities can be exported that are shown in this list. Once you have done this, you press the button "Export CSV" in the top-right corner. This will download the file Activities.csv onto your computer system. The data includes information about dates, distances, pace, HR, elevation, and more.

3. Data cleaning using Python Pandas

The data is rather clean, and we only need to focus on the following aspects:

  1. Transform date information into various useful features
  2. Transform pace information into minutes
  3. Transform numbers into a number type
  4. Deal with missing values

Below you will find the code I used to achieve just that. If you want to know more about logging I recommend this article: "How to Setup Logging for Python" and if you want to know more about the way I write the code, please consider reading "The Flawless Pipes of Python".

Output of runs_raw.info(); image by author
Output of runs_raw.info(); image by author

Please note that I use the function assign() to create additional variables. That has the benefit not to override values for later testing and that you may implement your own naming conventions for naming columns. With the function filter(like = "run") I remove all old variables, nevertheless. Please see below the resulting column names.

Result of the cleaning process; Image by author
Result of the cleaning process; Image by author

The last step is to save your cleaned data to your system.

4. Data visualization with Altair

As you might know, I started my Data Science career using R, and hence, I am very familiar with ggpot2. As I started using Python more and more often in my customer projects, I started to look for a similar visualization package in Python. I am happy to say, that with Plotnine there is a very good alternative.

Data Visualization in Python like in R’s ggplot2

However, since I like to try new things in general and Visualization packages specifically, I wanted to try out the Altair package. Altair is a Python data visualization package that follows a declarative approach, just like ggplot2 and Plotnine.

Please follow along with the next paragraphs where I evaluate some of my running data. Maybe you can pick up some knowledge on how to use Altair.

Setup

How much did I run during the last ten years (105 month)?

Number of runs per year; image by author
Number of runs per year; image by author
Number of runs per month; image by author
Number of runs per month; image by author

At what distances did I train in each year?

Distances by year; image by author
Distances by year; image by author

At what heart rates did I train each year?

Average HR by year; image by author
Average HR by year; image by author

At what pace did I run which distances each year?

Average pace per distance per year; image by author
Average pace per distance per year; image by author

5. Conclusion

In this article, I showed you how to get your running data from Garmin connect, how to clean the data using Pandas, and how to analyze it using Altair.

As you may have observed, my running started to get a little more serious in 2018. Before that time my job required a lot of traveling. A fact that hindered me to keep up with running.

For me, this exercise was worthwhile to have a year-on-year comparison of my running data. Something impossible to achieve on Garmin connect. Additionally, it allowed me to get to know more about the Altair package. Since I am much more used to work with Plotnine, I will stick with Plotnine for the moment. But should a project require interactive charts I highly recommend looking into Altair.

Please feel free to contact me with any questions and comments. Thank you. Find more articles from me here:

  1. Learn how I plan my articles for Medium
  2. Learn how to setup logging for your Python code
  3. Learn how to write clean code in Python using chaining (or pipes)
  4. Learn how to analyze your LinkedIn data using R
  5. Learn how to create charts in a descriptive way in Python using grammar of graphics

Gregor Scheithauer is a consultant, data scientist, and researcher. He is specialized in the topics of Process Mining, Business Process Management, and Analytics. You can connect with him on LinkedIn, Twitter, or here on Medium. Thank you!


Related Articles