The world’s leading publication for data science, AI, and ML professionals.

Analyzing My 2020 Google Location History Data

I recently watched a video in which an individual downloads a copy of his Google data and is surprised to see both the quantity and extent…

Summarizing 2020 travel trends and habits using my phone’s logged location data.

Screenshot from Google Maps Timeline
Screenshot from Google Maps Timeline

I recently watched a video in which an individual downloads a copy of his Google data and is surprised to see both the quantity and extent of data collected by Google. I was intrigued and I wanted to take a look at what kind of dirt Google had on me. I went to takeout.google.com and followed the steps to download a copy of all of my data, which turned out to be around 19 GB.

While there was a lot of data to look through, a lot of it turned out to be uninteresting: lots of sparsely populated Excel spreadsheets for Google products I don’t use, search/watch history for Youtube and Chrome, and copies of pretty much every picture or document I’ve hosted through Google (and many that I haven’t uploaded, which I found particularly concerning).

Among some of the most interesting data I found was my location history, provided as formatted json files showing every location I’ve been to, which travel method I took to get there, how long I was there for, and other metrics for every month from April 2015 through December 2020. I’m not sure why the data collection starts in April of 2015, since I know I had Google Maps/location-enabled phones prior to that. Perhaps I had my location settings more secured and didn’t allow Google to store that type of data.

Semantic Location History files
Semantic Location History files

For the purpose of this article, I’ll be focusing only on data for 2020. The COVID-19 pandemic made 2020 an interesting year to say the least, and I’m curious what sorts of trends can be drawn from my location history.

Locations Logged for 2020

The first thing I did was start to look at the data for "places," which listed the address of the location where Google said I had arrived. Not surprisingly, my home address appeared at the highest frequency each month. I also plotted here the frequency of arrivals to each of the two work locations I work out of:

As the COVID-19 pandemic began to take its hold in the US, you can see my traveling behavior changed, and I left the house a lot less frequently from March through May. In April, 92% of my trips outside of arriving home were for traveling to one of two work locations. The other 8% was grocery shopping. While my pre-pandemic shopping habits typically involve more frequent trips to the store for fewer items (walking distance to my house), I instead opted for fewer trips of a larger quantity of groceries during the March-May timeframe to lower the risk of transmission.

As more data emerged about the pandemic and health officials started to piece together the effective IFR rate, my traveling behavior started increasing beginning mid to late May (while still taking the proper health precautions and avoiding large groups of people, of course).

Outside of logged events for home and work, I wanted to see what trends were noticed for other repeating logged addresses. I generated a new dataframe to show months as an index, and each location as a column. Before any filtering, the dataframe gave me 329 unique locations. That sounds somewhat accurate, although I would have predicted a larger number of entries. To filter out the majority addresses with one visit, I used the following filter:

The output of this gave me a new dataframe with 25 locations, or 22 if you discount work 1, work 2, and home.

Dataframe showing frequency of visits for each location per month (address names redacted for privacy).
Dataframe showing frequency of visits for each location per month (address names redacted for privacy).

Now let’s look at a graph showing some of the most frequently visited addresses in 2020.

Once again, it’s clear to see how limited my travel outside the home was in April and May. I remember heading to a different store to get larger bulk quantities of groceries in March to prepare for the covid quarantine, so Vons and Trader Joes were not visited then. I’m surprised to see 5 visits in January to the hiking area I enjoy near my house.

Distance Traveled in 2020

In addition to seeing where I traveled, I wanted to dig into the coordinate data provided in the Google location data to find out how far I traveled.

Google location history data provides start and end coordinates for each logged location.
Google location history data provides start and end coordinates for each logged location.

To help us get the distance from coordinates, we can use the Python module vincenty, aptly named after Vincenty’s formulae for calculating the distance between two points on the surface of a spheroid. Fun fact: Vincenty’s methods are accurate to within half a millimeter on the Earth ellipsoid!

The distance traveled for March and April also reflects the address/location data. I pretty much kept travel to work and the store, both of which are relatively short distances. April showed a total of 445.8 miles traveled. Averaging across 30 working days, we yield an average of 14.8 miles per day for those limited activities.

The outlier of July involved a family reunion trip and a backpacking trip, where I traveled further from home. For the reunion trip, one way involved driving while the way back was flying. The Google data algorithm provides estimates for travel type – does my location data match up to that?

Flying is in there! Although their estimation isn’t perfect – the algorithm estimated 960 miles of flying, while the real flight was closer to 600 miles. I imagine this is due to disconnection/reconnection from location data as we took off and landed. 209 miles of walking also seems extraordinarily high. While I did a lot of hiking that month during the backpacking trip, it wasn’t anything close to 200 miles.

Google’s algorithm can even predict if the travel type is skiing, which I was doing in February of this year. We took a bus to and from the lodge to the mountain, which also shows up here. Pretty powerful:

At just over 20,000 miles logged by Google, passenger vehicle was the greatest type of travel I used in 2020. At first review of that number and comparing to the increase on my odometer for this year, that number seems to be a massive overestimation. However, my phone is with me every time I’m in a friend’s car as well, and so each of those instances would be logged. Here is a breakdown of the travel types used, according to my Google location data.

Driving made up approximately 94% of the distance I traveled, followed by flying and walking. It's important to keep in mind that the walking distance is inaccurate - there's no way I walked 209 miles in July.
Driving made up approximately 94% of the distance I traveled, followed by flying and walking. It’s important to keep in mind that the walking distance is inaccurate – there’s no way I walked 209 miles in July.

Wrapping Up

Google has a lot of data on us. If you’re like me and have your location data enabled on your phone, despite the fact that we’re not actively searching for most places we go, every one of those places is being logged, down to the method used to get there. Compiling this data across millions of active monthly users gives Google a lot of data to work with. The accuracy of ETAs and traffic delays down to the minute starts to make sense when thinking about the powerful models and thousands of terabytes of data Google processes daily.

Despite the ominous nature of the information being collected on us, it is intriguing to review our own contribution to that data and draw inferences on our behavior.


Related Articles