How to use mobility data to understand citizen's habits?

Daily trends and city classification using 1 day of cell phone data in Switzerland.

Jeremy Mion
Towards Data Science

--

Image by author

What can you learn from 1 day of mobility data? That is the question that I will be answering in this blog post. I’ll first focus on the data acquisition and processing before looking at the insight that can be made from this anonymized data.

The data that I used is from the Swisscom Mobility Insight Platform free trial. Swisscom is the largest telecommunication company in Switzerland. Every day about 20 billion interactions between cell phones and the network are recorded and aggregated. This information allows for a deeper understanding of how people move, travel, and live.

Multiple metrics are provided for all of Switzerland with aggregation done to tiles of 100m x 100m. Information such as the ratio of male/female gender, age distribution, and population density are available. K-anonymization is done on the data to avoid revealing information that could allow identifying the individuals.

All of this information is available on an hourly and daily basis from the Mobility Insight Platform heatmap API. Since the free trial only offers data on the 27th of January 2020, that will be the day that I will focus on.

The following code queries the Swisscom API and fetches the data for any given town in Switzerland. Since querying the API is quite a time-consuming process, the decision was made to focus on a subset of cities in Switzerland. The python code used to query the API can be found on Github.

After cleaning the data and preparing it in Pandas DataFrames here are a few insights that I looked into.

Male / female ratio

Photo by Alex Iby on Unsplash

The y-axis (count) indicates the number of tiles matching the x-axis.

Proportion (%) of males in histograms — Images by author

Representing the data above with box plots allows for an easier comparison of the range of values observed.

Above we see the proportion (%) of males per tile in a cat-plot. — Image by author

Unsurprisingly, we can observe a normal distribution. It’s always good to check the things that you expect to be true, you never know when something weird is going to show up and force you to dig deeper into the data to make some interesting insights on it.

Age distribution over time

Photo by Rod Long on Unsplash

The data available contains the hourly demographics of age distribution in a city. The data is split into 4 categories:

Image by author

Lausanne

Image by author

Zürich

Image by author

The trend that is visible in these 2 large cities is an increase in young kids in the cities during working hours.

Population density

Photo by Ishan @seefromthesky on Unsplash

The population density evolution over time seems like a potentially interesting idea. I didn’t know what to expect from this data. The density is on the y-axis, the hours of the day are on the x-axis

Population density scores were normalized between [0,1] to show trends in the evolution of the density in the town and not the difference in density between each town. — Image by author

The chart above is a bit crowded but it does show that there are at least 2 very distinct behaviours. Some cities seem to have an increase in the population density during the day, some seem to have a decrease. To see a little bit clearer in the data above, we extracted the non-normalized density charts of some cities:

Ski resort towns — Image by author
Suburbs — Image by author
Large towns — Image by author

There is something quite different between the first charts and the rest. The hypothesis that seems probable for the surprising charts of Laax and Saas-Fee, is that we are observing the influx of skiers at the start and end of the ski day. It appears that the large majority of skiers in these resorts spend the night elsewhere and travel to and from the resort to go skiing. The weather for Zermatt a town 17 km away from Saas-Fee shows us a sunny day with a few scattered clouds and a high of -6°C. It’s perfect weather to hit the runs and enjoy a nice day of sliding down the slopes.

The rest of the locations that were analyzed are larger towns or suburbs. A potential explanation for the behaviour observed is that citizens commute to these locations during the day before leaving in the evening.

Clustering with DBSCAN

Using the data that we have available for a given city as a feature vector and normalizing it before running a DBSCAN we can see multiple clusters emerge. The feature vector used was built using the daily average of male_proportion by age category, and the hourly density that we just looked at before. The DBSCAN produced the following clustering:

The visualization was done using a PCA (Principal Component Analysis) to provide a meaningful visualization. Cluster 0,1,3 are all ski resorts. Cluster 2 is towns with very little industry but where many people live and commute to work. Cluster 4 are larger towns. — Image by author

Looking at the population density for cluster 2

Image by author

From the exploration above we’ve shown that it’s possible to classify cities into large categories such as suburbs, touristic ski resorts, or large cities by simply using the mobility information of the citizens that travel to and from these locations.

Conclusion

Mobility data is a wonderful source of information that can be used to understand the functioning of our cities.

From our analysis, we have observed that looking at mobility data can allow us to identify towns where there is skiing. The demographics also show young people travelling to urban centers during the school day.

This data has become even more valuable during the SARS-COV2 pandemic since it provides a numerical way to measure the changes in the habits of Swiss citizens.

All the code and notebooks used to produce this data story are available here on Github.

--

--

Computer Science Master student at EPFL (Swiss Federal Institute of Technologies in Lausanne), specializing in Data analytics.