TUTORIAL – PYTHON – RUNNING
1. Introduction
Hi, I am Gregor, a Data Scientist, and a passionate non-competitive runner. I just realized that I started to use a running app on my phone ten years ago. Back then, I just recorded GPS, start and end-times. I had no means to record cadence, heart rate, elevation, and the like. I remember to be a bad runner – slow and easily out of breath. I just finished my Ph.D. and worked way too much at my desk. So I decided to start this journey as a runner.
Ten years is quite a long time and I am wondering what changed in my runs over this time frame. Did I get fitter or faster or calmer? At which level do I train? While most Fitness platforms provide some reports I missed the view over my last 10 years. Especially I am interested in year-on-year comparisons.
In this article, I will show you (1) how to get your running data exported from Garmin connect, (2) how to clean the data using Python Pandas, and (3) how to visualize the data using Altair.
2. Data export from Garmin Connect
First of all, head to Garmin Connect and especially to the "All Activities" page. It is best to filter your running activities by clicking on the running icon in the top-right corner.

Before you can hit the export button, it is important to scroll down the list of your running activities. Only those activities can be exported that are shown in this list. Once you have done this, you press the button "Export CSV" in the top-right corner. This will download the file Activities.csv
onto your computer system. The data includes information about dates, distances, pace, HR, elevation, and more.
3. Data cleaning using Python Pandas
The data is rather clean, and we only need to focus on the following aspects:
- Transform date information into various useful features
- Transform pace information into minutes
- Transform numbers into a number type
- Deal with missing values
Below you will find the code I used to achieve just that. If you want to know more about logging I recommend this article: "How to Setup Logging for Python" and if you want to know more about the way I write the code, please consider reading "The Flawless Pipes of Python".

Please note that I use the function assign()
to create additional variables. That has the benefit not to override values for later testing and that you may implement your own naming conventions for naming columns. With the function filter(like = "run")
I remove all old variables, nevertheless. Please see below the resulting column names.

The last step is to save your cleaned data to your system.
4. Data visualization with Altair
As you might know, I started my Data Science career using R, and hence, I am very familiar with ggpot2. As I started using Python more and more often in my customer projects, I started to look for a similar visualization package in Python. I am happy to say, that with Plotnine there is a very good alternative.
However, since I like to try new things in general and Visualization packages specifically, I wanted to try out the Altair package. Altair is a Python data visualization package that follows a declarative approach, just like ggplot2 and Plotnine.
Please follow along with the next paragraphs where I evaluate some of my running data. Maybe you can pick up some knowledge on how to use Altair.
Setup
How much did I run during the last ten years (105 month)?


At what distances did I train in each year?

At what heart rates did I train each year?

At what pace did I run which distances each year?

5. Conclusion
In this article, I showed you how to get your running data from Garmin connect, how to clean the data using Pandas, and how to analyze it using Altair.
As you may have observed, my running started to get a little more serious in 2018. Before that time my job required a lot of traveling. A fact that hindered me to keep up with running.
For me, this exercise was worthwhile to have a year-on-year comparison of my running data. Something impossible to achieve on Garmin connect. Additionally, it allowed me to get to know more about the Altair package. Since I am much more used to work with Plotnine, I will stick with Plotnine for the moment. But should a project require interactive charts I highly recommend looking into Altair.
Please feel free to contact me with any questions and comments. Thank you. Find more articles from me here:
- Learn how I plan my articles for Medium
- Learn how to setup logging for your Python code
- Learn how to write clean code in Python using chaining (or pipes)
- Learn how to analyze your LinkedIn data using R
- Learn how to create charts in a descriptive way in Python using grammar of graphics
Gregor Scheithauer is a consultant, data scientist, and researcher. He is specialized in the topics of Process Mining, Business Process Management, and Analytics. You can connect with him on LinkedIn, Twitter, or here on Medium. Thank you!