The world’s leading publication for data science, AI, and ML professionals.

Urban Graph Networks

How graph theory can save you hours for your next city trip!

Image created by midjourney.ai - provided by author - Brandenburg Gate Berlin - schematic.
Image created by midjourney.ai – provided by author – Brandenburg Gate Berlin – schematic.

Why not plan a trip to Berlin? Why not visit all the famous places like Brandenburg Gate or Checkpoint Charlie? As we are data people, we search for a data-informed solution. How can we receive data and find an optimized way using data analysis? This is what this article is all about. We explain how to load data from OpenStreetMap, how to visualize data with python’s folium library, and which algorithms you would need to plan an optimized trip.


This article is structured as follows. We begin by introducing how OpenStreetMap works and how we can receive data via python. Then, we discover how we can display data via python’s folium library. Afterwards we deep dive into Graph theory and its applications. For instance, routing via Dijkstra’s algorithm and approaching the Traveling Salesman Problem are the final steps towards a data-informed optimized trip visiting Berlin’s famous spots. You will learn:

  • how OpenStreetMaps data model works,
  • how Coordinate Reference Systems work,
  • basics of Graph Theory and its algorithmic challenges, as well as,
  • how to draw beautiful maps with python’s folium library.

Let’s get started.

OpenStreetMap – a brief introduction

OpenStreetMap is an international project with the aim to create an open and free map of the world, see https://www.openstreetmap.org. It is accessible under Open Database License (ODbL) and one could fetch data with a read-only API called Overpass API (cf. http://overpass-api.de)). In this article we use a python library called overpy to query overpass API to receive data from OpenStreetMap, see https://pypi.org/project/overpy/ for details.

Before we start, let’s find out how OpenStreetMap’s data model works. Basically, its data model consists of nodes, ways and _relations._For instance, nodes do have a specific position determined via latitude and longitude, like a tree, a bench, or a corner of a building. Tags and comments can be added, as well as these nodes are building blocks for more complex spatial structures like ways, buildings, or other areas. A way is an ordered list of nodes resulting in polygons or lines, if the first node is the last or not. Then it might represent a building or a park for polygons, or a river or highway in the latter case.

Moreover, relations describe situations like a bus route consisting of more than one way. Throughout this article, we focus on nodes and ways. According to this fundamental data model, any object might carry more information than its geographical position. There are labels such as "highway" or "amenity" available or opening hours for shops. With these classifications at hand, one could extract suitable data for a plethora of spatial use cases.

A remark on data quality: a large, open community is collecting data. This community cannot be at any place at any time. Therefore, use this data carefully and think about the data critically. In general, infrastructure changes slowly, but it changes over time and space. Also social and commercial facts change and it is quite difficult to keep up with this pace. Keep that in mind whenever you work with OpenStreetMap data.

Getting started with overpy and folium

At the beginning, there is usually a python environment and some libraries one should use to explore, gather, and process data. Throughout this first paragraph, we use overpy for data extraction, and folium to visualize spatial data. A starting point might be choosing a correct bounding box as an object where to look for data. The secret sauce here is to find out the correct coordinates in a reasonable coordinate system. Here, we use latitude and longitude in the WGS84 notation having a rectangular bounding box including Berlin. We will revisit WGS84 later on and explain it in depth.

Please pay attention to the chosen area. It could cause fetching a huge amount of data if you do not choose this bounding box smart. My advice, use https://overpass-turbo.eu to pre-query and find out the right dimensions. This particular overpass query fetches way-objects from the given bounding box. The code snippet below also introduces a class MapConfig configuring folium map parameters.

After creating an API-query, there are functions necessary to execute it. Therefore, before putting all puzzle pieces together, we introduce functions called _fetch_highwaydata and _process_highwaydata. The first function returns an overpass API object combined with the given query in a particular bounding box. As usual in software design, one could argue which statements to pass and what is the most modular object. For different tasks, one could possibly pass also self-defined queries and OSM objects to get a more individual statement.

The second function processes the given raw data from the API and restructures them to dictionary data objects. The idea is to extract and split coordinates, nodes, and ways.

As a next step we introduce a function that creates a folium map called _createmap. In general, a folium map object will be initialized by Map, passing a starting location, a certain zoom-factor and information about the underlying tiles. These have already been initialized in the MapConfig class above.

There are two ways of adding data to a map. Either you are simply adding data to the map-object m, what results in simply showing the data without having any options.

Another way is to add data to different layers. With this technique you are able to toggle layers on and off, or hide these when mounting a map. The corresponding Layer Control can then be found in the upper right corner.

To configure the main flow of the program, we introduce a function called main to orchestrate all other functions. In addition, one can also add the constructing nodes as a layer. But, be aware of how your map will react in terms of response times. In January 2025 the given bounding box contained 55K highway entities and about 135K node entities.

The resulting _mapobject will return a map and the following image is a screenshot.

Openstreetmap data from Berlin city centre displayed via folium library - highways - provided by author
Openstreetmap data from Berlin city centre displayed via folium library – highways – provided by author

Recap: In this first paragraph we introduced OpenStreetMap’s data model and showed how to fetch and display highway data via Python’s folium and overpy libraries. Coming up next: what graph theory is and how could a shortest-path be constructed via Dijkstra’s algorithm.


Graph theory – an introduction

In its simplest mathematical description a graph is a tuple consisting of nodes V and edges E, i.e. (V,E). To be honest, this sounds very abstract. Let’s fill it with life. There are many examples what a node, also called vortex, could represent. For instance, it could be some point on earth described by latitude and longitude. Or, it could simply represent a person in a social Network.

On the other hand, an edge is a connection between two nodes. Regarding the two contexts from above, an edge could be a road connecting two locations on earth, or a relation "knowing each other" between two human beings. This turns out to be a model for real life relations and exceeds reagular tabular relations by far. Moreover, this basic graph structure can be enriched in many different ways.

For example, one could think of edges that are non-symmetic, i.e. we have one-way streets or people who know someone, but the other one is not aware of the other person. These graphs are called directed graphs. Also, one could add weights to each edge, resulting in a tuple (V, E, w). For example, adding the length of a road between two locations helps finding out how far different locations are apart. In a social network one could introduce a realtion on "how close do you know a person?" having values betwenn 0 and 1.

So far, we have solely described two basic situations – navigation and social network – with this theory, not solving any problems yet and not defining any problem statements. One could think of questions like:

  • How do I find my way from location A to location B?
  • Which are the most influential people in my social network?
  • How can I find a roundtrip through the city centre visiting 5 different points of interest?
  • Small world experiment: what is the average path length in social networks?

Each of these might result in a new medium article but aren’t part of this one. Let’s focus on the given OpenStreetMap data and how there is a shortest-path between two locations. We begin with the famous Dijkstra algorithm, a building block of all common navigation systems constructing a shortest path. The basic idea is the following: follow edges with the smallest known distance to a node. In every step distances are being updated to every neighborhood of a node. If there is a smaller distance, update, if not, keep the actual distance. We will use this algorithm to find a shortest-path from location A to B through Berlin.

Remark on graph algorithms: There are two important aspects with this specific problem statements.

  1. Many of these algorithms have large runtimes. That is, runtime-complexity could increase dramatically just by adding a few more data points, because there are so many new possibilities available.
  2. Many times there is no unique solution available, because there are two or more paths for example that minimize certain distances, or the target functional.
  3. There are many graph problems that are of class NP, i.e. called NP-hard. We will briefly discuss what this means in one of the next paragraphs. Sometimes there are no algorithms available to find a one best solution. Therefore algorithms are in place to heuristically find a solution.

Shortest path approach – from Kanzleramt to Alexanderplatz

In this paragraph we are trying to find the shortest path from Bundeskanzleramt to Alexanderplatz—a task where usually Dijkstra’s algorithm comes into play. Let’s walk through the code. Additionally to folium‘s library we do need NetworkX, numpy, and Transformer from pyproj here.

Since I wanted to try out to work with dataclasses in this article, we create a class called BerlinPOI here, holding information about points of interest in Berlin. It comes up as some sort of dictionary, mapping internal names of POIs to OpenStreetMap node IDs. I manually compiled these by walking through overpass turbo and exploring these IDs.

Coordinate reference systems – crsAs a next step, we want to create a weighted graph consisting of edges that are street segments and the distances between nodes. One crucial aspect here is the given coordinate reference system, crs. CRS have always been of strategic importance in history – that is why there are so many of them. The World Geodetic System 1984 – WGS84 – measures in polar coordinates (latitude and longitude) originating from a center point of this planet. This, in contrary, leads to the fact, that you cannot use these coordinates to measure distances directly on the surface – earth’s curvature comes into play. To cover this, geodesics assume that earth is locally flat. A coordinate transformation on a locally flat space is necessary. If this happens, one could use Pythagoras’ theorem to compute metric distances. We use UTM32N here. UTM stands for Universal Transverse Mercator which is a projection on a local grid. The specification 32N tells us which grid to choose. If you want to learn more about it, or you need to find out which grid to choose, there are many resources on the internet on these geo-spatial topics and calculations.

So, basically, the creation of a street graph transforms the given coordinates in WGS84 notation to UTM32N and calculates the distances for every given segment or edge of the graph. Be careful here, there might occur performance issues depending on how large your amount of data is. As a result we find an object G that holds nodes, edges and also weights. You find the corresponding code-snippet below.

Applying Dijkstra’s algorithm To visualize a route we introduce a function that takes graph G, start and end point and some visual control parameters. A very helpful library here is NetworkX which provides Dijkstra’s algorithm. Input parameters are G, start point, and end point. Not only the path will be calculated, also the length of this path. In the graphs nodes we stored the original WGS84 coordinates to draw them with folium on a map. Polyline and Marker objects help to visualize the path as well as starting and end points. We have created an exception here, since you never know which points have been chosen and if they are valid.

To put all the aforementioned together, we initialze and calculate G, create a map object _msingle and find a route, see the code snippet below.

Executing _msingle returns a map that looks like the image given below.

From the starting point - Bundeskanzleramt - given in green, we find a shortest path to Alexanderplatz marked in red - picture by author.
From the starting point – Bundeskanzleramt – given in green, we find a shortest path to Alexanderplatz marked in red – picture by author.

Having a single path is nice. The next task is to find a roundtrip for our next visit to Berlin. This will be part of the next paragraph.

Traveling Sales problem – roundtrip Berlin

In general, a Traveling Salesman Problem (TSP) describes the challenge to determine the order of visits of several places, such that you visit every place just once, except for your starting point, and that the route is short. Apart from roundtrips in Berlin, this task comes up in many other different fields.

Not going too much into detail, this is an NP-hard problem, i.e. assuming that the class P is not equal to NP, there exists no algorithm that finds a shortest trip in polynomial worst-case runtime.

Starting with a number of given points of interest in Berlin. We would like to visit: Brandenburg Gate, Alexanderplatz, Bode Museum, Reichstag, Bundeskanzleramt, and Checkpoint Charlie. The node IDs were already given before, let’s introduce a dictionary holding names for these POIs:

To tackle TSP it takes two steps:

  1. calculate all shortest-paths between all given POIs
  2. solve TSP by route approximation
  3. In the aforementioned paragraph we have introduced Dijkstra’s algorithm to find a shortest path. This indeed helps to create a distance matrix of all shortest paths between all POI pairs. Since this is a symmetric problem, i.e. routing from location A to B is the same as from location B to A, we have to calculate the shortest path only for the upper triangular matrix.
  4. Given this distance matrix, there are many approximations to an optimal route. The documentation of NetworkX shows that different approaches have been implemented, i.e. Christofides, Greedy approach, Simulated Annealing, and Asadpour. These work under different conditions and circumstances, e.g. directed or undirected graphs. We won’t go too much into detail here what are pros and cons of these heuristics. Christofides-algorithm is an adequate choice for undirected, weighted graphs as we find the situation here.

The function _solve_tsptour follows these two steps, confine the next code snippet.

Again, we put the puzzle pieces together by creating a folium map object, pass data to the function that solves TSP, to afterwards visualize the results.

This seems to be a full-day roundtrip walking around approximately 8 kilometers. You will find the single legs and distances in the layer control on the upper right corner of your screen. The final result looks like the picture below.

Picture showing a solution to the TSP and the distances of every leg in the right upper corner - picture by author.
Picture showing a solution to the TSP and the distances of every leg in the right upper corner – picture by author.

Learnings

In this article we have been exploring data from OpenStreetMap, i.e. Berlin city centre streets. Our aim was to find a suitable roundtrip visiting different POIs. Through this exploration we have deep dived into the following

  • OpenStreetMaps data model and how one can use Overpass API with overpy library
  • Geo-Spatial Visualization techniques with folium library
  • Graph theory basics, such as node, edges, weights, as well as algorithms for shortest paths like Dijkstra’s algorithm and NP-hard problems like Traveling Salesman Problem and how to solve these
  • Coordinate Reference Systems

I have provided a code repository on GitHub to easily copy the code and try for yourself. Check it out and give it a star! https://github.com/StephanHausberg/UrbanGraphNetworks

If you like what you have read – follow me on medium.com, follow me on GitHub, or follow me on LinkedIn. https://www.linkedin.com/in/dr-stephan-hausberg-679750118/

Looking forward to your valuable feedback.


Related Articles