Thoughts and Theory

Clustering in Geospatial Applications — which model should you use?

A novel comparison between KMeans, DBSCAN, Hierarchical clustering models in machine learning, applied to urban networks

Skanda Vivek
Towards Data Science
10 min readJul 2, 2021

--

Hong Kong Night Traffic | PxHere

Take a look at the popular machine learning toolbox in Python, scikit-learn’s page on different clustering algorithms — and you will see comparisons between 10 different algorithms. The package developers have done an excellent job in comparing and visualizing different clustering algorithms applied to different toy scenarios. The strength of these visualizations is that you for sure know the ground truth — 3 blobs are supposed to be 3 clusters, for example. However, this does not explicitly tell us how these algorithms will fare with geospatial data, which can be quite complex. Some important applications of geospatial clustering include reducing the size of large location data sets, and understanding large-scale mobility patterns through taxi trip clustering, for urban planning and transportation.

In many real world cases, it is hard to know a priori how many clusters is right. In which case, it can be hard to interpret the results of a clustering algorithm in grouping your data into clusters. Add 9 other algorithms, and you could end up in each…

--

--