Mapping the Jams: Traffic Analysis Using Graph Theory

Learn how to find potential critical points in your city infrastructure using graph theory

Published in

Towards Data Science

10 min readAug 19, 2023

Graph theory has plenty of applications in real-life problems such as social networks, molecular biology, or geospatial data. Today, I’ll present the last one, analyzing city’s road layout to predict critical streets, junctions, and how changes to the infrastructure can affect it. But first, let’s start with the basics.

Graphs and their centrality measures

Graphs are sets of vertices and their edges:

Set E is subset of unordered tuples (x, y) where x and y are vertices of the graph and x is not equal to y. [Image by the author]

Where the edges represent connections between the nodes. If edges do not have directions, we call a graph undirected. A real-life example of an undirected graph can be a chemical molecule, where the vertices are atoms, and bonds are represented as edges.

Serotonin molecule is an example of a simple undirected graph. [source]

However, sometimes we need information about whether the edge goes from u to v, from v to u, or both ways. For example, if Mark likes Alice, it doesn’t necessarily mean it’s mutual ( ☹ ). In those situations, we can define the edge as an ordered tuple instead of unordered one.

Brackets represent unordered tuple in formulas, while parentheses represent ordered one. [Image by the author]

Human interactions can be described using directed graphs. [Image by the author]

Using the graph structure, we can define a centrality measure. It’s a metric used for answering the question:

How important is this vertex/edge in a graph?”

And there are many ways to answer it.

Different ways to evaluate the importance of graph components

Depending on the task, we can start from a different point evaluating centrality. One of the most common metrics are: Degree, Closeness and Betweenness. We will discuss them using Zachary’s Karate Club graph [more info]. It presents ties between different karate club members. You can find code used to generate pictures below here.

Degree centrality

The most basic of centralities. It’s defined only for vertices and it’s equal to the degree of the vertex (which is the number of the neighboring vertices). As an example, we can think back to the graph of human relationships, and in case of the friendships among people this metric would answer the question

“How popular is this person?”

Nodes’ degree centrality for Karate Club graph. Centrality measures are normalized by maximum degree of the graph (which is number of the nodes minus one). [Image by the author]

Paths in graph

For the next two centralities, we need to introduce a few concepts to our knowledge of the graph theory. All of them are very intuitive, starting from the edge’s weights. We can add weights to our edges, to mark the difference between them. For example, this can be road length in case of traffic graph.

In graphs we can define paths, which are lists of vertices we need to traverse to get from A to B. Consecutive vertices in the path are neighbors, first vertex is the A, and the last is B. Path distance is the sum of the edges weights along of it. The shortest path between A and B is the path with the smallest distance.

The shortest path between A and F is [A, C, E, D, F] with distance 20. [source]

Closeness centrality

Having all this new knowledge, we can go back to our metrics. Next one is closeness centrality, which tells us how close a node to the rest of the graph is. It’s defined for a specific vertex as an inverse of a mean of shortest paths to all other vertices in the graph. This way, shorter average path translates to higher closeness centrality.

Betweenness centrality

Betweenness centrality gives us information, which nodes of a graph are crucial for the traffic going through it. Imagine a city with an extensive road network, where every junction is a node. Some of those serve as a key connectors in daily commutes, while others may be a cul-de-sacs with close to none impact on traffic flow. The former one possess high Betweenness centrality scores, calculated as proportion of the shortest paths traversing through the intersection.

City plan as a graph

Now, as we have tools for describing and analyzing graph, we can start extracting city’s plan to a graph form. To do that we can Open Street Maps (OSM), to import it in Python as NX graph using osmnx library. We’ll start with a smaller example to discuss what additional process we need to apply, in order to improve time and efficiency of our work.

Grzegórzki is one of the eighteen districts of Krakow’s city, with two complex roundabouts — Mogilskie and Grzegórzeckie, and many junctions. Thus, we’ll be able to see most of potential pitfalls with data engineering.

Grzegórzki’s administrative borders. [©Google]

Let’s start with importing data from the OSM repository to a Python graph, and plot the results:

Raw OSM data import. White dots are nodes, which should represents roads’ junctions. [Image by the author]

There’s something wrong with this graph — can you spot what it is?

We get multiple edges for single sections of roads, resulting the graph with almost 3 000 “junctions”. This does not provide proper representation (we can’t make a U-turn in the middle of a road, and every node cause calculation to be slower). To fix this situation, we will perform graph topology simplification by removing all nodes on the road between two junctions. In OSMnx, we have a function for that called ox.simplify_graph().

Road layout after topology simplifications. Now every node represents road crossing. [Image by the author]

There’s one more catch — as you may see, we have two edges for the most of roads, one for each way. Due to this, we have multiple nodes for every intersection, which is an unwanted behavior. Imagine that we’re on a junction, we’re turning left, and there’s no separate lane for a left turn (or it’s already full). As long as we won’t be able to do the turn, the other cars are blocked. In our current graph, this isn’t the truth. The left turn is made of 2 separate nodes, one for turning left, and the other for crossing opposite lane. This would indicate that those are two independent operations, while they are not.

That’s why we’re going to consolidate intersections, meaning that we will combine multiple nodes close to each other into one. We’ll choose the consolidation radius big enough to consolidate multiple parts of the intersections into one, but on the other hand keep roundabouts as multiple node structures, as they can be only partially blocked. To do this we will use osmnx function ox.consolidate_intersections().

Road layout after intersection consolidation. [Image by the author]

Comparison of the intersection. Before and after. [Image by the author]

After those operations, we’re almost ready for the analysis. The last caveat is Krakow’s municipality borders — as many people travel from the neighboring towns, and graph analysis includes only data within the graph, we need to include those areas. I’ll present in the next chapter implications of not doing that. And here’s our graph:

Colors indicate maximum speed. The brighter the color the higher the value. We can see the A4 highway colored using yellow. Most of the roads, colored in blue, are 50 km/h. [Image by the author]

You can find the source code used to generate this map, as well as all graphic used in the next chapter in this jupyter notebook.

Betweenness centrality of road layout

For this case study we will focus only on Betweenness centrality measurement for estimating road traffic. In future, this might be extended to other techniques from graph theory, including GNN usage (Graph Neural Networks).

We will start with calculating Betweenness centrality measurement for all nodes and edges in a road layout representation. For that we will use NetworkX library.

Krakow’s Betweenness centrality for each road segment. [Image by the author]

Due to a high number of roads on a graph, it’s hard to see which components have highest probability of being critical for traffic. Let’s take a look at a centrality measurement distribution for the graph.

Distribution of centrality measures for streets and junctions in Krakow road layout. [Image by the author]

We can use those distributions to filter out less important junctions and streets. We will select top 2% of each where the threshold values are:

0.047 for nodes,
0.021 for edges.

Graph centrality measurements after thresholding. [Image by the author]

We can see that the most important road segments by betweenness are:

The A4 highway and the S7 being the beltway of Krakow (note that Krakow does not have northern part of the beltway),
The western part of 2nd ring road and it’s connection to A4,
The northern part of 3rd ring road (substituting missing northern beltway),
The Nowohucka street connecting 2nd ring road with north-eastern part of the city,
The Wielicka road leading from city center to the south-eastern highway part.

Let’s compare this information to a real life traffic map of Krakow from Google Maps:

Typical traffic in Krakow on Monday commute [©2023 Google, source]

We can see that our insights correlate with the results from traffic radar. The mechanism behind that is quite simple — components with high betweenness centrality are those used to commute most of shortest paths in the graph. If car drivers select the best paths for their routes, then the streets and junctions with the highest traffic volumes will be the ones with the highest betweenness centrality.

Let’s head back to the last part of the graph engineering — extending graph borders. We can check what would happen if we only took the city’s borders to our analysis:

Krakow’s road betwenness centrality without taking neighboring towns into the graph. [Image by the author]

The A4 highway, which is one of the most important component due to the beltway nature, has one of the lowest centrality measures in the whole graph! This happens because as the A4 is at the outskirts of the city, and most of its traffic comes from the outside, we cannot include this factor in the betweenness centrality.

How to use betweenness centrality to analyze effects of layout change to traffic

Let’s take a look at a different scenario for graph analysis. Suppose that we want to predict how a road closure (for example due to the accident) affects the traffic. We can use the centrality measurements to compare differences between two graphs, and thus examine changes in the centrality.

In this study, we will simulate car accident on A4–7 highway segment, which is a common occurrence. The accident will cause a complete closure of the segment.

We will start by creating a new road network by removing A4–7 segment from graph, and recalculating centrality measurement.

New layout centrality measurements. Red A4 section represent missing part. [Image by the author]

Let’s take a look at a centrality distribution:

Distribution of centrality measures for streets and junctions in Krakow road layout after removing A4–7 highway segment. [Image by the author]

We can see that it’s still very similar to the original one. To inspect changes in the centrality measurements we will calculate residual graph, where centrality measurements are the difference between original road layout and after the accident. Positive values will indicate higher centrality after the accident. Nodes and junctions missing in one the graphs (such as A4–7) won’t be included in the residual graph. Below is the measurement distribution of the residuals:

Centrality change distribution after removing A4–7 highway segment. [Image by the author]

Again, we will filter out top 2% of streets and nodes affected. The threshold values this time are:

0.018 for nodes,
0.017 for edges.

Streets and junctions with highest increase in betwenness centrality after removing the A4–7 highway segment. [Image by the author]

We can see increases in roads connecting split parts of beltway to the city center, where the 2nd ring road is located. The highest change can be seen in the 2nd ring road which contains one of two left bridges over Vistula river on the western side of the city.

What graph centrality analysis cannot do on a road network

There are a few things that we cannot take account in during graph analysis. The two most important ones, that we could see in this analysis, are:

Graph centrality analysis assumes uniform distribution of traffic among the nodes.

Which is false in most cases, as villages and cities have different population densities. However, there are other effects that can reduce this, for example a higher amount of people living in neighboring villages will choose a car as a commute option in comparison to the people living in a city center.

Graph analysis takes into the account only things that are present within the graph.

This is harder to see in the provided examples, especially for someone outside the Krakow. Let’s take a look at Zakopianka. It’s a major traffic artery between the city centre and most of the municipalities south of Krakow, and it’s also part of DK7 (national road no. 7) which spans across whole country.

DK7 road — green parts represent expressways. [source]

If we compare typical traffic on DK7 in Krakow to our centrality measures, they’re completely different. Average betweenness centrality is around 0.01, which is a two times smaller value than the top 2% threshold. While in reality, it’s one of the most blocked sections.

Comparison between Zakopianka average congestion and betweenneess centrality. [©2023 Google, source]

In conclusion

Graph theory and its analysis have applications in multiple scenarios, such as traffic analysis presented in this study. Using basic operations and metrics on graphs, we can get valuable insights in much shorter time in comparison to building a whole simulation model.

This whole analysis can be performed using several dozen lines of Python code, and it’s not limited to one road layout. We can also very easily transition to other analysis tools from Graph Theory.

As all things, this method has also its drawbacks. The major ones being assumptions about uniform traffic distribution and scope limited to graph structure.

Github repository containing code used in this study can be found here.