Geospatial Indexing with Uber’s H3

Hexagon power!

João Paulo Figueira
Towards Data Science

--

Photo by Jonas Svidras on Unsplash

In this article, I am presenting Uber’s H3 geospatial indexing system. One of the many open-source initiatives from this company, the H3 system allows you to quickly and efficiently index your data for later querying.

Geospatial indexing is essential for aggregating and querying data at scale. This type of data is generally abundant, difficult to index or search, and can be structurally complex. Polygons that discriminate particular areas can be very complex to handle. Think of a city’s boundaries. How many points do you need to efficiently determine if a given vehicle has entered a town or even a gas station? The more points you need, the more calculations you will require from your CPU. Overburdening your hardware translates to slower response times and higher resource usage.

An efficient geospatial indexing system helps you overcome these hurdles. In the case of H3, the solution takes the form of a hashing scheme.

How Does H3 Work?

The H3 algorithm partitions the Earth’s surface into a network of hexagons. You can select the amount of detail each hexagon contains by choosing among the available sixteen levels. You can think of these as “zoom” levels on a map. Every time you dive in a further step, the hexagons become smaller, and you need more of them to cover the same area.

Each hexagon is unique and is identifiable as such. You address individual hexagons through a unique sixty-four-bit identifier, an ideal key for a database table, or an in-memory dictionary. These identifiers are consistent across “zoom” levels, so you can mix and match them as you please.

Your first approach to H3 can be to think about it as a bright geospatial hashing scheme. With it, you can quickly determine area inclusions using simple value lookups. You can easily cluster geospatial observations, and display them on a map.

But there is more. To give you a better understanding of H3’s capabilities, I will present a simple use-case in Python.

Back to the UK Traffic Accidents

My second article here on Medium was on mapping the UK’s traffic accident hotspots. I decided to have a second look at it from H3’s perspective, as an illustration for this article. Back then, my concern was to find the areas of the highest reported traffic accident activity and map them. Mapping areas of significant event frequency is a problem that Uber needs to solve daily, although on a significantly larger scale (and for a different use case, for sure).

Here I decided to use H3 for indexing, aggregation, and display. The code that illustrates this article is available on the GitHub repository, and I suggest you use it to follow along. You only need to open one Jupyter notebook:

uk-accident-h3.ipynb

In this file, we are mostly seeing the same process occurring as in the first version. We load the data, clean it, and cluster it using the same DBSCAN algorithm. Instead of going through the trouble of finding a shape for the layers, we are going to use H3’s ability to group locations, and representing them as clusters of hexagons graphically.

Using H3 to Display Regions

Once we have the results from DBSCAN clustering, we get an extra column for our dataset, the cluster number. Noise points have a cluster number of -1 and are not relevant. Getting from the clustered locations to the map shapes is a matter of hashing them into H3 hexagon keys. And this is as simple as calling a conversion function:

h3_key = h3.geo_to_h3(latitude, longitude, level)

The resulting value is a key to a dictionary where we store the hexagon shape and the number of locations that share that same key. To get the geospatial polygon, we need to call another function:

h3.h3_to_geo_boundary(h3_address=h3_key)

This function accepts an H3 key and returns a list of latitudes and longitudes of the hexagon’s vertices that we can readily use to display on a map. To view the map, we only need to iterate through the dictionary and create the polygon shapes. The relevant function is:

map = create_map(clusters)

To test the solution’s sensitivity, you can tweak the two DBSCAN parameters and the H3 level parameter:

eps_in_meters = 50.0
num_samples = 10
h3_level = 11

Here is what we get at the end:

H3 clusters generated from DBSCAN

Clustering with H3

Now we can go a bit further and use H3 to cluster the data directly. Instead of going through DBSCAN, we will use all the input data and hash it into H3 keys. Instead of displaying all possible hexagons, we will impose a limit on the number of occurrences per hexagon. This restriction will help unclutter the map and make it more responsive.

The resulting map is slightly different than the first one, but this is reasonable. After all, we are using different assumptions. Remember that DBSCAN clusters locations using a density-based criterion. In a sense, this is a structure-revealing algorithm. On the other hand, H3 uses a structure-imposing approach by defining what areas to use as clusters.

Conclusion

This article presented a relatively short and superficial foray into Uber’s H3. It is a handy tool to perform geospatial indexing and analysis. I hope that this has whetted your appetite to dig deeper into the library and add it to your geospatial analysis toolbelt. And have fun with it. I did!

References

[1] H3: Uber’s Hexagonal Hierarchical Spatial Index

[2] H3 Python GitHub repository

[3] UK accidents GitHub repository

Related Articles

--

--