The world’s leading publication for data science, AI, and ML professionals.

AI: Track Me

The amount of data collected by my phone is astounding. Let's look at my Google history and see what we can see. This is a 2 post article…

The amount of data collected by my phone is astounding. Let’s look at my Google history and see what we can see. This is a 2 post article. This post is about the data collection, and the next will be about machine learning on the data. If you don’t know python, or even how to program, you should still be able to follow along. If you know python, you should be able to replicate the steps in this article on your data with minimal effort. Simply apt-get or pip install the missing libraries as you go. That’s it.

First, we grab 133MB of my location tracking data from Google Takeout for one of my work accounts. The data is from 2014 to 2017. It does not have all my data, but it does have a lot of it.

To make the two maps below, the collected JSON files from Google Takeout were interpreted by online services. This gives us a sense of the shape and size of the dataset before we write any code.

Map of business trips to Montreal, Toronto, Florida, Texas, San Francisco, New York, L.A., Washington, Chicago, and other fun destinations. Somehow trips to San Diego, Israel, Arizona, Mexico, Cuba, etc were not tracked or were tracked under another business account. Graph generated here.
Map of business trips to Montreal, Toronto, Florida, Texas, San Francisco, New York, L.A., Washington, Chicago, and other fun destinations. Somehow trips to San Diego, Israel, Arizona, Mexico, Cuba, etc were not tracked or were tracked under another business account. Graph generated here.

At first glance, we see that I have made some business trips using this work account. The phone tracked 483,868 trips between June 2014 and July 2017. That’s about 1,126 days and 430 "trips" per day (from 483,868 trips / 1,126 days) that were tracked. Even so, several trips are missing from the data, either because the phone was left at home, or the data was recorded in another work account, or perhaps I brought a different phone. Perhaps instead of "trips" we should call these records movement recordings, or location records. Also, a lot of them were taken while I was asleep. That;s not exactly a trip. Looking at the image above, at the inset graphic on the bottom right, it is clear that I travel the least on Saturday, which lines up very nicely with my post on how much I email on Saturdays.

The heat map below shows where I have been going in Ottawa. Clearly a consultant’s life involves lots of bustling around the city, rather than hopping back and forth between work and home. This is amplified by weekend trips related to kids, health, shopping, and so on. The map makes sense when you think about where the industrial and commercial sectors of Ottawa are located. That main east-west worm shaped blob is Ottawa’s 417 highway.

Heat map of my trips within Ottawa 2014–2017. Graph generated here.
Heat map of my trips within Ottawa 2014–2017. Graph generated here.

Let’s ignore the annotations on the data, including what Google thinks I was doing when it recorded the location:

"activity" : [ {
        "type" : "STILL",
        "confidence" : 100
      } ]

We are going to focus in on my phone’s longitude and latitude at a particular time (timestamp). The following short program processes the 133MB JSON file from Google Takeout into a 20MB SQLite database file.

import sqlite3, json
from pprint import pprint    
if __name__ == '__main__':
    with open('LocationHistory.json') as data_file:    
        conn = sqlite3.connect("locationData.db")
        c = conn.cursor()
        c.execute("create table if not exists mylocation (timestampMs INTEGER, lat INTEGER, lng INTEGER)")
        i=0
        for location in json.load(data_file)["locations"]:
            i+=1
            #use pprint(location) to see the data
            c.execute("insert into mylocation (timestampMs, lat, lng) VALUES (?,?,?)",(location["timestampMs"],location["latitudeE7"],location["longitudeE7"],))
            if i % 1000 == 0:
                conn.commit()

There are now exactly 483,000 rows in the database table mylocation. I found 303,019 distinct pairs of longitude and latitude points. Narrowing down to Ottawa, which is at 45.4215° N, 75.6972° W, we can find all the points in the heatmap picture above. We do this with the following simple query:

select lat, lng from mylocation where lng > -763515690 and lng < -754101650 and lat < 456811330 and lat > 450194280 order by timestampMs asc

Note that the latitude and longitude are stored without the decimal place. This query above defines a box with the east and west walls composed of Arnprior (45.436555, -76.351569) and Cumberland (45.518922, -75.410165), while the north and south sides of the box are La Peche (45.681133, -75.931882) and Kempville / North Grenville (45.019428, -75.639859). The result is 416,558 rows of data that mark a place and time in the Ottawa area.

This data represents a sequence. Let’s see if we can use an LSTM to predict my movements, based up on this data, and then map the results.

First, let’s turn the data into tensor/numpy format. The data is pretty simple. Let’s simplify it even further by ignoring the timestamps and turning the data into a stream of two numbers: the location data. Each location is a pair of numbers [lat, lng]. The goal now is to train a Regression model to approximate the sequence. Put simply, it can guess where in longitude and latitude I will be located after seeing a series of places that I have been.

Let’s do that next time. Stay tuned for part 2.

Happy Coding!

-Daniel [email protected] ← Say hi. Lemay.ai 1(855)LEMAY-AI

Other articles you may enjoy:


Related Articles