The world’s leading publication for data science, AI, and ML professionals.

Fast and easy way to dynamically import Open Street Map data into a Google Sheet file

how to use Overpass APIs to generate a dynamic list in Google Sheet, with user-friendly parameters.

Problem overview

You work as a business analyst at the company that manages metro underground trains transport in a large Italian city and you are required to provide the marketing&communication department a list of the bars and the cafés in the area, to distribute them maps that can be affixed to the windows or inside the rooms. In addition, the closest bars and cafés to the stations will be contacted by the sales department to propose to join and ticket sales program, for the transport service that your company provides.

This is a periodic request that occurs on a monthly basis.

Data sources

One of the primary data sources for this kind of geography-related problems is Open Street Map.

There are several ways to access the Open Street Map database, but one of the most popular is using the Overpass APIs. In this article, we will see how to use these APIs to generate a dynamic list in Google Sheet, generated by a parameterized query and with values set by the final business user. In this way, other colleagues can get the data on their own and can copy, edit or save the list for future manipulations, for example assigning each bar or cafe to one key account manager that will contacts the structure.

Queries via Overpass API can be made using two languages: Overpass XML or Overpass QL.

This time we will use Overpass QL, an imperative programming language written with a C style syntax. The main "getting started" guide to learn this language is available on this page.

Data extraction

Let’s summarize what we need to extract:

  • text output, we’ll use the standard Csv then with the header on the first line and with comma as column separator, Google Sheet will appreciate this choice!
  • list of bars and cafés in the indicated area;
  • the premises must be within a radius of X meters from a subway station entrance.

The detail of the attributes to be extracted is very varied and not all the information for each premises is fully compiled, compared to other countries (first and foremost Germany) there could be a good number of detail fields that are still incomplete (for example the telephone number or opening hours).

The timeout will be fixed in 20 seconds. The business user will not be able to easily change this value. It is not appropriate to increase it, as the data we want to extract are not big and the following rule applies:

"If the query runs longer than this time, the server may abort the query with a timeout. The second effect is, this higher value, the server probably rejects the query before executing it. "

Let’s set up the query!

As anticipated, we want to extract the list of bars and cafés in a particular area. These elements are classified within the OSM database with the following tags:

  • amenity = bar
  • amenity = cafe

A good rule to follow, when we have to work with Overpass APIs, is to consult the TagInfo pages related to the tags we want to extract:

Here we find various useful information for our purposes. In the "combination" tab we find the list of features concerning that tag, which is a good starting point to make the list of columns we want to extract and which we will specify a little later in our project.

In the "overview" tab, instead, we see how this place is coded in the database. In this case, we can easily see that bars and cafés can be nodes, way or relations. For our purposes, this information is fundamental, either because in the query we will specify to search in all these entities and because we know that the information can be stored as a point (nodes) or a wider area (way). In this project we obviously want to extract only the information related to that bar/cafés once, so the latitude and longitudes coordinates we want to get are those of the center of the area bounded by the lines.

Let’s take an example, in the case of a square bar there will be four way elements inside the OSM database, but we want that in the sheet all those are represented in a single line, considering that the bar is always the same (same name, same telephone number, …). This purpose will be achieved by indicating to Overpass to extract only the center of the area.

The last element to consider in the query is the area we want to extract. We make the choice to extract the data using the boundaries of each municipality, which will be indicated in the input cells filled by the user of the sheet, bearing in mind that some metro stations are located in different Municipalities.

From this useful table, we can learn that the administrative boundaries of Italian municipalities are indicated in the OSM database with level 8.

At this point we have all the elements to build our query:

We considered the query with fixed distance values (50 meters) between the bars/cafés and the entrances of the stations and also a fixed municipality (ISTAT code number 015146 corresponds to the city of Milan) so we have to modify the query to make it usable with Google sheet.

How to use the query into a spreadsheet

We need to follow these steps:

  • Include the query within the IMPORTDATA formula (read the Google Sheet official guide);
  • Replace the single single-double quotation marks with nested-double quotation marks;
  • Replace the distance value with "&B5&";
  • Replace the value of the ISTAT code of the municipality with "&B4&".

We finally came to the following:

With a data validation linked to a hidden sheet and a VLOOKUP formula to get the ISTAT code, we can make a user-friendly drop-down menu from which the user can choose the name of the municipality and digit the value of the distance to consider.

Each time cell B3 or cell B4 changes, values of list of premises is updated.


Related Articles