How to use Open Data for Business Insights

In this walkthrough, I will use OpenStreetMap data from Bigquery. It’s great that Google published it for free in their public data sets. So you can easily query geo information with SQL [1]. Google uploaded their data set once without any updates yet. If you need newer data, then the OpenStreetMap API in combination with some simple python code might be a possible solution for your problem.
Use Cases for OpenStreetMap
OpenStreetMap.org is an international project founded in 2004 with the goal of creating a free map of the world. For this purpose we collect data about roads, railroads, rivers, forests, houses and much more worldwide [2].
Although the data from OpenStreetMap is free, but you can well draw certain business advantages. Some examples that I have helped to develop or at least accompanied are for example:
- Geo analysis (Where are my customers located?)
- Market analysis (Where do my competitors have stores?)
- Tour optimization (How can I optimize logistics routes?)
- Fraud detection (Are the invoices for transport justified?)
- and many more
What is Nested Data?
Querying data from the BigQuery Dataset will return nested data fields. BigQuery supports loading and querying nested and recurring data from source formats that support object-based schemas for example JSON. Here is a short illustration of nested data:

The address column contains an array of values. The different addresses in the array are the recurring data. The different fields within each address are the nested data.
Real Example
For an example walkthrough with BigQuery I used the open dataset _planetfeatures in the _geoopenstreetmap dataset. Here, the data is stored in a nested format, so let’s take a look:
SELECT * FROM `bigquery-public-data.geo_openstreetmap.planet_features` LIMIT 1000
This is the result in the following output:

The result looks good at first sight. Unlike classical relational databases, I can use arrays and save additional columns. Furthermore, the whole thing works with a super performance. Now, I would like to find a certain supermarket in Germany – should be done with an easy where clause – right? Almost. Here, you need the magic of Unnest,
SELECT *
FROM `bigquery-PUBLIC- data.geo_openstreetmap.planet_features`
WHERE 'Netto' IN
(
SELECT value
FROM unnest(all_tags))
AND (
'addr:country', 'DE') IN
(
SELECT (KEY, value)
FROM unnest(all_tags))
AND (
'addr:city', 'Hamburg') IN
(
SELECT (KEY, value)
FROM unnest (all_tags));
which results in the desired output:

With the function of unnest you will also be able to flatten the data and so the output of the query:
SELECT osm_id,tags
FROM bigquery-public-data.geo_openstreetmap.planet_features,
UNNEST(all_tags) as tags limit 100

This is probably almost the most important you have to know when working with nested data in BigQuery and want to query some data. But what if you want to use the data for further ETL processes, store the data in relational databases or need the key-value-pairs as attributes for your classifier? Here, you want the data not to flatten – because this will result in duplicate rows – here you want the key-value-pairs in the array as new columns:
SELECT
(
SELECT osm_id) osmid,
(
SELECT value
FROM unnest(all_tags)
WHERE KEY = "Address") AS address,
(
SELECT value
FROM unnest(all_tags)
WHERE KEY = "name") AS name,
(
SELECT value
FROM unnest(all_tags)
WHERE KEY = "opening_hours") AS opening_hours,
(
SELECT value
FROM unnest(all_tags)
WHERE KEY = "organic") AS organic,
(
SELECT geometry) AS geometry,
FROM bigquery-public-data.geo_openstreetmap.planet_features
WHERE (
'Edeka' IN
(
SELECT value
FROM unnest(all_tags))
OR 'Rewe' IN
(
SELECT value
FROM unnest(all_tags))
OR 'Netto' IN
(
SELECT value
FROM unnest(all_tags)))
AND (
'addr:country', 'DE') IN
(
SELECT (KEY, value)
FROM unnest(all_tags)) -
AND (
'addr:city', 'Hamburg') IN
(
SELECT (KEY, value)
FROM unnest(all_tags));
Which gives us the output:

In the end, you can visualize the data via business intelligence tools like Google Data Studio – especially when working with BigQuery.

In the related use case, the data was presented also on a map to get an idea how the shops are distributed. In the dashboard an underlying table also gives nearer insights. For deeper analysis the data was also used in upcoming customer segmentation scenarios.
Conclusion
Besides OpenStreetMap API and Googles public data sets there are many more data sets you can find and use for your analytics. Here, I recommend the Google dataset search website: https://www.google.com/publicdata/directory
New systems like BigQuery offer extremely high computing power and fast results through column-based databases. With the above examples, you should be able to query and process most of your nested data use cases very well. With the free data and the computing power, you can implement many profitable use cases. As a first step, I would recommend to simply deal with the data. You can do this relatively easily with the Google Cloud Free tier.
Sources and further Readings
[1] Google, OpenStreetMap Public Dataset
[2] OpenStreetMap, https://www.openstreetmap.de/ (2021)
[3] Google, Solve real business challenges on Google Cloud (2021)