The world’s leading publication for data science, AI, and ML professionals.

Create a heatmap from the logs of your activity tracker

How to import the data from your apps and devices and create a heatmap from GPX files with Python.

Heatmap (Image by author)
Heatmap (Image by author)

I have 7 years’ worth of recorded walking activity on my computer. Over all these years these have been collected with several devices and apps, from stand-alone Gps-receiver, through SportsTracker to Garmin. Luckily, all of them have in common that the recorded route is available in GPX-format. Obtaining these files might not be as simple tough.

What is GPX?

GPS Exchange Format (Gpx) is a GPS data format in XML. It is an open, license free, format that describes waypoints, tracks and routes. It is widely adopted and therefore the absolute standard interchanging location data. A location is stored as longitude-latitude (decimal degrees) pair and optionally extended with elevation (meters), time (UTC) and vendor specific information.

A track in GPX format (Image by author)
A track in GPX format (Image by author)

The example above is a track stored by Garmin, tracks stored by e.g. SportsTracker and Fitbit have the same structure but differ in the details.

The root element is gpx. It contains a metadata element and a trk element. The metadata specifies the source and creation time of the file. The trk element contains the stored track. A track consists of one or more segments, each stored in a trkseg element. At track level there are some fields with the name and type of the track. The specification allows for more fields but these are not used by e.g. Garmin.

The trkseg segment elements contains a list of track points (trkpt). A segment is a continuous set of points. If tracking is interrupted, e.g. lost connection or power failure, a new segment ought to be created.

A trkpt has the mandatory attributes lat and lon to specify the location. It has optional fields like time (time)and elevation (ele). The element extensions allows each device/app to add additional information like heartrate (hr in the Garmin example above), speed or course (not used by Garmin).

Tracks created with SportsTracker include the lon/lat attributes and the time and elevation elements. They have no extension like Garmin. The following data structure is assumed for parsing the GPX files:

<gpx>
  <metadata>
    <name>02-01-20 13:29</name>
  </metadata>
  <trk>
    <trkseg>
      <trkpt lat="52.12345" lon="6.31235">
        <ele>71.1</ele>
        <time>2021-08-02T12:29:18Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

Importing GPX files

For reading GPX files several Python libraries are available, like gpxpy. But for educational purposes an implementation is offered. This implementation makes use of ElementTree from the default XML implementation in Python. To read all gpx files from a directory:

This code retrieves all files from directory data and parses all files with the .gpx extension (from line 18). The root element is retrieved from the document (line 19) which is in this case the <gpx> element. All elements reside in a namespace that is defined in line 6.

The start time of the track log is stored in the sub element <time> of the element <metadata>. Then (line 21) the track is selected, in this case with the .find() method. This finds the first occurrence of an element with the specified name. This is sufficient since none of my GPX files contains multiple tracks, or even multiple segments. In the case of multiple tracks, the .find() can be replaced by iterating the results of .findAll(). The track element contains the elements name and type that contain the name of the activity and the type of activity (e.g. walking) (lines 22–23).

From the track the first (and only) segment is selected (line 24). The segment contains all logged locations so by iterating over the segment all logged locations are parsed (line 25). The point(element trkpnt) contains two attributes with the latitude and longitude (26–27) and two child elements with the timestamp of the log entry and the elevation (line 28–31). The elevation might be absent so we need to check its existence before converting (line 31).

All information is added to an array that is converted to a dataframe in line 33. Finally, both time fields are converted to a timestamp object so we end up with the following dataframe (in this case almost 1 million locations):

Dataframe with all logged locations from GPX (Image by author)
Dataframe with all logged locations from GPX (Image by author)

The majority of the activity logs is in the area where I live with a small amount during holidays. Filtering the locations around my home location gives a first idea of the Heatmap.

df[(df.lat > LAT_MIN) &amp; (df.lat < LAT_MAX) &amp; 
   (df.lon > LON_MIN) &amp; (df.lon < LON_MAX)]. 
   plot.scatter('lon', 'lat', figsize=(10,10), s=0.1)

In a (x,y) coordinated system, the longitude is the x and the latitude the y.

Tracked activities (Image by author)
Tracked activities (Image by author)

Create a heatmap

The initial plot gives a nice overview of all activities, but does not provide insight in the most visited areas. For this, we are going to create a heatmap. We will make this from scratch.

The heatmap will be created in a numpy array of size <size_x, size_y>. So first we have to convert the longitude and latitude to this dimension; minimum longitude to 0, maximum longitude to size_x and the same for latitude.

First all locations outside the longitude/latitude range is removed. Then new columns for x and y are created and assigned the appropriate values using linear transformations. The values are converted to integers since we will use them to address points in a numpy matrix.

For those interested, the math behind linear regression:

Linear regression explained (Image by author)
Linear regression explained (Image by author)

The next step is to create a matrix filled with zeros of the required size. Each tracked point will be added to this matrix. Not only the exact location is added but a square around it (based on width). This is especially useful for high resolution images where lines with a width of one pixel will be nearly invisible.

We could iterate over df2 and add each point one by one. But we optimise this by first grouping over (x,y) and counting the number of occurrences. When the same location occurs e.g. 7 times, it is only added one time instead of seven times. The addition (line 8), instead of an assignment, is still required due to the usage of width. Locations next to each other will add up due to the width usage.

High frequently used routes will light up strong due to the addition effect of the width added. To prevent them from pushing away less used routes, the count is maximised (line 11) to the number of routes in the dataset. It is possible to add a factor to this to optimize the end result (line 10).

Now that we have the array filled with the number of occurrences per <x,y> we can translate it to a color scale. After normalising the values a matplotlib colormap is applied. In this case the ‘hot‘ colormap is used since it works good with the black background (started with zeros in the array equalling black).

The last step is to plot the image using matplotlib:

The origin="lower" is required because the zero point for the image is at the bottom left but the default is top left. The result of our effort:

The generated heatmap (Image by author)
The generated heatmap (Image by author)

So, here we have our own heatmap. It is possible to make the picture a bit more friendly to the eye by adding a gaussian blur to it. But for now, I am very happy with the result.

Final worlds

We have seen how simple it is to import a GPX file from any source. From this data we have created a heatmap to visualize the visisited locations. Improvements are possible by adding a blur to the image or even add a map as a background (a lot of work, but worth it).

In the next article I will show you how to calculate metrics like distance travelled, speed and bearing.

I hope you enjoyed this article. For more inspiration check some of my other articles:

Disclaimer: The views and opinions included in this article belong only to the author.


Related Articles