Uber H3 for Data Analysis with Python

Eshwaran Venkat
Towards Data Science
10 min readJan 11, 2021

Edit 2023-March: Since the time of writing this article (H3 v3), H3 v4 has released which has different function names from the ones mentioned in this article. You can install v3 to follow the article, or install v4 and view the changelog and related article, and make the changes as they appear in this article.

If you’ve ever played Civilization VI or Settlers of Catan, you might have noticed that the board map is shaped into regular hexagons. This is a famed game design technique called Hex Map and is used in board games such as Hexagonal chess and video games such as Forge of Empires. This is a natural feature of the game because these maps bring in a lot of advantages for gameplay, especially in movement and tiling.

What if this design could prove useful in the real world if we re-modelled our own maps with a layer of hexagons? Perhaps we can find some actual business problems that could be solved with such a model. Enter Uber H3.

Why H3 and Why Hexagons? 🌏

Currently there exist a number of ways to analyze data related to maps. One of the most rudimentary ways is to plot a list of co-ordinates and see where the densities and clusters arise.

Fire Station Locations in Dubai, United Arab Emirates

Analyzing individual points has certain drawbacks. There is no clear form of classification for points within a certain area and the analysis we often wish to carry out may be area-wise or regional.

It would help if we had a grid system where points could be contained within a certain fence and we analyze the behavior of the grid itself as representing the behavior of all points underneath. This means each grid represents the data of the points beneath it. Another issue is that when using points directly, while useful for humans to see the map and notice where the data is bunching up may be computationally a bit more expensive for a computer system. A computer needs to calculate distances of neighboring points to group them together to gather the same visual insights as a human being.

However, coordinate points are as granular as you can go with respect to geolocation data so don’t dismiss them just yet. We simply need to find a grid framework around which we, or a computer can view geospatial data and derive insights from it.

A common form of using a grid system more locally is to define zones in cities, by drawing out a shape over a neighborhood, city, district or country. Administrative and political boundaries while useful to section the geospatial analysis into realistic areas may still be problematic in a number of ways. Some zones may be larger than other zones within a city, and there could be overlaps between two or more zones. Furthermore, from a mathematical/absolute perspective, defining zones in this form are highly arbitrary. Zones or districts may be defined based on natural features, population, economy, etc. and are subject to change over time.

Dubai, UAE zones as defined in Level 3 Layer of the UAE GeoPackage of GADM Academic Database

An absolute method of using uniform grid systems is to cover a the Earth with repeated tiling. Assuming that we want to tile our plane regularly and completely, we need to choose the shape that acts as a building block facilitating the complete tiling. This shape needs to be closed and uniformly repeating. Our candidates are the triangle (3 sides), square (4 sides), and hexagon (6 sides).

https://plus.maths.org/content/trouble-five

There is no right answer here, so choosing one of these depends on your use-case. At Uber, one of the most in-demand use-cases we have is to determine distances for both rides and deliveries.

Using a hexagon as the cell shape is critical for H3. Hexagons have only one distance between a hexagon’s center-point and its neighbors, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.

Besides distances, it is generally a good idea to choose a hexagon as the base shape because Hexagons are the Bestagons! ;)

About H3 ⚫️

H3 is an open source framework developed by Uber in the C Programming Language. At its core, H3 is a geospatial analysis tool that provides a hexagonal, hierarchical spatial index to gain insights from large geospatial datasets. The building blocks of H3 are different sized regular hexagonal polygons. These polygons are spread out over the entire projection of the earth map from pole to pole. This means that any location on the planet can be attributed to a H3 Hexagon down to a precision of 0.0000009 km² area.

Imagine it as a layer over the planet where each unit of the layer is a hexagon and each hexagon has a unique ID and can very quickly perform geospatial calculations. Each H3 hexagon can be thought of as its own object and each object can be accessed in a very short amount of time given its ID.

H3 Resolutions 📷

A core strength of H3 is that it covers the entire world with different sized hexagons. This means that the resolution of the layer can be adjusted based on the problem being solved, like scaling the entire grid up and down. H3 contains a total of 16 resolutions as described in the table below, and each resolution has a certain number of hexagons that span the entire earth as a layer ranging from 122 hexagons in the highest layer and about 500 trillion hexagons at the lowest layer. Each layer consists of a more granular level of hexagons and each hexagon of every layer has its own unique ID.

H3 Resolutions

H3 defines its indices via hexadecimal format (16-bits, and therefore 16 resolutions), so a cell resolution can be immediately determined by looking at the ID of the cell. The layered approach of different sized hexagons is what lends H3 its power of “hierarchy”. Every low resolution hexagon contains a set of child hexagons in higher resolutions. Every hexagon of a resolution can have sibling hexagons that share the same set of parent hexagons. The layers in essence define a tree of hexagons with the last layer (resolution) containing 500 trillion siblings.

It’s worth checking out this notebook. The resolution can be adjusted accordingly and the hexagons can be viewed over the entire earth.

H3 Functions 🔩

You can find some examples of using H3 functions directly in the C language here. We’ll be using the H3 library binding with the Python programming language since it relatively easier to analyze data with Python.

pip install h3

Let’s say you had a coordinate point, or even a list of coordinates. You can fetch the H3 index of each point with the following function:

h3.geo_to_h3(
lat=25.32,
lng=55.46,
resolution=7
)

Get the resolution of a H3 index if the index represented in a string is valid

h3_result = lambda id_str: h3.h3_get_resolution(id_str) if h3.h3_is_valid(id_str) else Noneh3_result('8843a13687fffff')
The H3 index from the snippet above visualized with Kepler.GL

Let’s try a full fledged function that returns a set of H3 attributes, given a H3 index

Another useful function in H3 is its ability to quickly return an index’s k-nearest neighbors. This means that for k =1, a hexagon’s 1st degree neighbors are returned, and for k = 2, it is second degree neighbors (or neighbors of neighbors) and so on.

h3_id = "8843a13687fffff"
h3.k_ring(h3_id,1)
h3.k_ring(h3_id,2)
h3.k_ring(h3_id,10)
K-ring for 1 and 2 from Hex ID: 8843a13687fffff

Optional Sprinkles: Populating Hexagons with Maps Data 📍

Now that we have some absolute methods to define grids over the Earth, it would suit us to begin assigning real-world details to hexagons. This is because despite having been given a H3 index and its centroid coordinate, we don’t have much to go on in terms of the physical location which that hexagon represents.

One way to extend the power of H3 is to combine it together with Maps APIs that contain on-the-ground information. Common choices include Google Maps API, Mapbox API and Nominatim. These services allow us to assign more useful geographical information per hexagon.

One common function is reverse geocoding, which is a technique of converting a given coordinate into a physical text address by means of a lookup. Let’s find the reverse geocode result of the centroid of a H3 index using the Google Maps Client for Python.

Reverse Geocoded Hexagon — Tooltip contains Geo-Information

Polyfill 🌐

If H3 contains different resolution hexagons that span the entire earth, what if you wanted to select a fraction of those hexagons that represented a country, city or neighborhood?

The poly-fill function fills a polygon with H3 Hexagons. A geofence is a polygon spread out over a map. Zones in cities, or country shapes can be modelled as geofences. Geofences are often represented by means of Geo-JSON files or Shapely polygons. An example of a shapely polygon’s WKT(well known text representation):

POLYGON ((55.13977696520102 25.09805053895709, 55.14002932545401 25.09743871100549, 55.1407574981263 25.0972787358399, 55.14123332293791 25.09773058763684, 55.14098097073993 25.09834242241949, 55.14025278567518 25.09850239857415, 55.13977696520102 25.09805053895709))

A WKT is, as the name implies, a textual representation of the polygon’s vertex points. Notice how the first set of coordinates and last set of coordinates in the polygon are the same (as highlighted in the code block). This is a closed polygonal figure.

Every “Point” object of a polygon can be of the form of (lat, lng) or (lng, lat). The latter is usually a Geo-JSON representation of the polygon, since the Geo-JSON specification requires longitudes to appear first in the Point object.

The shapely polygon object needs to be extracted from the WKT, and the WKT is an attribute of the shapely polygon as demonstrated below:

The loads function parses out and builds a shapely polygon from its WKT representation.

I’ve taken a H3 Hexagon’s own WKT here, but you can get a WKT of pretty much any polygon. Another point to note is that the shape has no reference to a map. This is because we need a base map to plot our polygons on, as a layer.

Let’s take a geofence of an entire city, say Dubai, UAE and see what the Multi-polygon object looks like. I’ll be using the GADM database to extract the UAE country’s Geo-Package. In this way, we’ll be able to view an entire city/district as a geometrical shape. We’ll process the Geo-Package and output the Multi-Polygon for sector 3, Dubai as shown below:

It is a “multi-polygon” because it contains a list of polygons, as shown in the world islands off the shore of Dubai where each island is its own polygon.

Dubai’s Sector 3 Zone as a Polygon, with no Base map

Below is an example of a Geo-JSON file, which can be another useful form to represent geofences. Note again how the first and last coordinates of the list are the same.

{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
55.07652282714844,
25.11731056144692

],
[
55.15205383300781,
25.055745117015316
],
[
55.223464965820305,
25.112958466940725
],
[
55.15068054199219,
25.18070920440447
],
[
55.07652282714844,
25.11731056144692

]
]
]
}
}
]
}

A Geo-JSON file is a standard JSON file with special key names that help certain frameworks understand the nature of the data being passed. The Geo-JSON specification defines a list of Features called a Feature Collection, and in this case, the feature is of type Polygon. The properties dictionary can be populated with custom data and may be useful to provide attributes for different feature IDs. Say for example, if you wanted to store the population per polygon of a city, then properties is where you can save that information. It is an extension of your polygon that allows each polygon/feature to carry relevant information specific to itself for later analysis.

You can use geojson.io, or any GIS tool/software to create custom geo-fences by drawing out your own shapes.

Defining zones allow us to construct choropleth maps (maps with zonal coloring). The Plotly graphing libary has native choropleth support using Geo-JSON files. Kepler natively supports H3 visualizations by using just the indices of H3 hexagons from the data.

Now we know what we need to deal with the elephant in the room. How can we reconcile geofences with H3? We poly-fill the entire boundary with specific resolution hexagons (resolution 10 in our case). H3 has capabilities to poly-fill both Shapely Polygons and Geo-JSON objects.

Using the H3 poly-fill function to populate geo-boundaries with hexagons.

Polyfill Gallery 💠

The output of this data-frame visualized from Uber’s Kepler.GL below with different resolutions.

Close-Up of Resolution 10 Hexagons
Dubai Poly-filled with Resolution 10 Hexagons (100,000 hexagons spanning the city)
Dubai Poly-filled with Resolution 8 Hexagons ( 2,500 hexagons spanning the city)

Looks like a retro 16-bit city to me (ba dum tss).

Analysis 📈

Now that we have our hexagons populated across a city, and we have each hexagon’s ID stored in a table/data-frame,

  1. We can perform any H3 related function on each hexagon (Find its parent for a more aggregate analysis, store its geo-fence, find its neighboring hexes, etc)
  2. We can track coordinate level data grouped by their respective hexagons. In this case, hexagons become buckets which we can use to perform grouped analysis on a set of coordinate points per bucket/hex

Some non-exhaustive ways to fully utilize H3 would be:

  1. Using it as an “on/off” indicator to highlight activity in a particular region. Remember that H3 hexagons can be of different resolutions so you can go as granular as you want or aggregate it upwards to lower resolutions.
  2. View time series data by analyzing the evolution of hexagons over a time frame. Each hexagon is representative of the neighborhood of a number of coordinates that reside within its boundaries.
  3. Clustering areas with specific levels of activity.

The rest is up to you ;)

Thanks for reading!

A few more links:

Uber Open Source: Engineering Sub-City Geos for a Hyper-Local Marketplace with Uber

Veritasium’s “The Infinite Pattern That Never Repeats”

References:

Uber Engineering Blog: H3: Uber’s Hexagonal Hierarchical Spatial Index

H3 Docs

--

--