Geospatial Indexing with Quadkeys

Squaring the Earth

João Paulo Figueira
Towards Data Science

--

Photo by Steve Johnson on Unsplash

When you browse an interactive map online or on your phone, you see the result of a patchwork of square tiles. Each tile contains a bit of the map information and perfectly matches the eight tiles around it. This arrangement works so well that it is almost impossible to perceive that you are not browsing an infinite bitmap, but a small and finite set of square tiles.

The purpose of this article is not to delve into the intricacies of the so-called “slippy maps” but to discuss an interesting mathematical byproduct that serves to reference the tiles uniquely but is also very useful in geospatial indexing applications. To keep tabs on their interactive mapping solutions, Microsoft developed an indexing concept that is best detailed in its online documentation, the Bing Maps Tile System.

This indexing system also has other less obvious applications, such as the partitioning of geospatial areas and their efficient indexing in databases. It works a bit like Uber’s H3 but, instead of using “hexagons,” it uses a hierarchical structure of “squares.” I am using quotes around the geometric shape names because their projections into a sphere naturally distort them.

We have previously seen the advantage of using the H3 geospatial indexing system, not only for fast referencing but also for quick distance queries. Squares do not work as effectively due to their geometry. Still, they also have significant advantages, not only when generating custom tiles for slippy maps but also when indexing rectangular data. Here I am thinking about queries to online mapping services. These queries usually require a bounding box specification, rectangular (at least in the projection coordinates), not hexagonal.

The Cache Use Case

To motivate using the mathematics behind the Bing Maps Tile System and its “quadkey” concept, let us go back to the code that I have been publishing on exploring the Vehicle Energy Dataset. In a previous set of articles, I approached the VED by converting the data into a database, clustering the trip data into its endpoint clusters, calculating the geofence polygons using H3, and finally, naming them.

This naming procedure involved retrieving data from the OpenStreetMap service through their public and free Overpass API. I also devised a very simple-minded caching mechanism to avoid repeating the same calls to the API while developing the code. Users should not abuse the API and cache the retrieved data whenever possible. My approach to caching the API results was naive, as I used the cluster identifiers as the base for the file names. There is no geographical reference there, and these file names are liable to change if the clustering algorithm parameterization changes.

The alternative solution that I explore in this article involves using file names that are somehow geographically referenced and indifferent to algorithmic changes. Enter the quadkey codes.

Quadkeys

The name quadkey is short for quadtree key. Each of these keys encodes a square region in latitude and longitude space organized by detail levels. At the first level, the whole mappable surface of the Earth splits into four quadkeys. Think of it as a map zoom level that allows you to see the entire world. Each quadkey has a single-digit code, zero to three. By zooming into the next level, each of the four original quadkeys splits into four, and we add another digit to the code. The image below, taken from the official Microsoft documentation, depicts this process.

The image above illustrates how to derive the quadkey codes according to location and detail level. (Image source: Microsoft - Bing Maps Tile System)

By looking at the image above, we can immediately imagine a possible “algebra” to operate on quadkeys. The operations to calculate the neighbors, children, or parents of a specific quadkey immediately spring to mind. It would also be possible to query the geographical properties of quadkeys, such as the corners, center, and sides’ length. Fortunately, this mathematical tooling already exists in the form of the “pyquadkey2” Python package.

If you look at quadkey codes with the eyes of a low-level software developer, you will see that it takes only two bits to encode a detail level. Your typical maximum of 23 levels requires only 46 bits to encode, which allows us to use sixty-four-bit integers to store these codes. Fortunately, a specification for binary quadkeys already exists to which the above package adheres.

Caching Example

Let’s see how we can use the quadkeys for the caching use case. The idea is to collect the minimum set of quadkeys that contains all the cluster’s points, and then query OSM for those bounding coordinates only. The code then stores each reply in a corresponding JSON-encoded text file named according to the used quadkey. This way, we get an OSM data cache that depends on geography alone, not on the clustering scheme. The image below shows the level-18 quadkeys for cluster number 52.

The image above shows cluster number 52 with the superimposed quadkeys used to cache its OSM data. Image created by the author.

As you can see, the choice of detail level is an important decision and allows you to strike a trade-off between the number of quadkeys and the amount of information collected for each of them. The official documentation contains a table that guides you on the level of detail choice. The third column of the table shows the ground resolution measured in meters per pixel at the Equator. The choice of meters per pixel as a unit reveals its ultimate purpose: using a slippy map with square tiles 256 pixels wide. You need to multiply the column value by 256 to get the size in meters of a quadkey side in the Equator. At level 18, we get 152.88 meters. Remember that as you move away from the Equator, the width shrinks down to zero.

Code

To follow the code description, please clone the GitHub repository and run the notebook named:

07-cluster-names-quadkey.ipynb

This file is a revised version of a pre-existing one, with the same prefix number and the old caching implementation. Note that you may have to execute previous notebooks before running the one above to prepare the database for use.

We start with the functions that manage the cache. The first is the low-level call to the OSM API, used when there is no cached information.

The next function queries the data using a quadkey as input. It determines if the data is already cached, in which case it merely reads it from the corresponding file, and if not, queries it from the API and caches it.

Note how we query the quadkey for its geographic coordinates at the southwest and northeast corners. Now that we have quadkey data, we must merge it before querying it. Remember that there is usually more than one quadkey per cluster. To join the data, we must first query its element types, namely “node” and “way,” and we use the next function for that purpose.

The function that does the actual merging strips away all metadata and keeps the “elements” array only.

Now we can use the previous function as a building block when merging a list of OSM packets.

The code retrieves the quadkeys associated with a single cluster using the function below. Each point is converted to a quadkey code and added to a set, for uniqueness, converted back to a list, and returned.

We are now ready to bring all the code together in a function that retrieves the merged OSM data for a single cluster.

From this function onwards, the notebook code follows its original version quite closely. The only significant addition is the rendition of the quadkey “squares” on the map, as depicted in the image of cluster 52, above.

Conclusion

This article showed how you could use the mathematics of quadkeys to generate geospatial indexes. These operate quite similarly to H3 but are square instead of hexagonal. We have seen that hexagons are better for some applications, such as KNN searches. Still, squares are perfect for modeling sippy map tiles and rectangular regions aligned with Earth’s parallels and meridians. With the added convenience of packing a quadkey code in a sixty-four-bit integer and the support of a very efficient Python package, quadkeys are indeed another great tool to keep handy on your geospatial toolkit.

References

Bing Maps Tile System

Resources

GitHub repository

pyquadkey2 repository

CartoDB python-quadkey repository (alternative package to the one used here)

YouTube: Earth tiling showing quadkey

Related Articles

--

--