I recently came across with an exciting dataset titled Roman Road Network (version 2008) on Harvard’s Dataverse: the historical road networks of the Roman Empire in a perfect GIS format! Additionally, I have been working on a project about public transport networks and how to identify the hotspots and bottlenecks of such a network network science. Then I quickly realized that by putting these all together, I could quickly answer this age-old question and see how central the area of Rome indeed was back in the day.
In this article, all images were created by the author.
1. Read and visualize the data
First, let’s use GeoPandas and Matplotlib to load and explore the Roman road network data quickly.
import geopandas as gpd # version: 0.9.0
import matplotlib.pyplot as plt # version: 3.7.1
gdf = gpd.read_file('dataverse_files-2')
gdf = gdf.to_crs(4326)
print(len(gdf))
gdf.head(3)
The output of this cell:
Now visualize it:
f, ax = plt.subplots(1,1,figsize=(15,10))
gdf.plot(column = 'CERTAINTY', ax=ax)
2. Transform the road network into a Graph object
The previous figure shows the road network as a bunch of line polygons. However, to be able to quantify the importance of, for instance, Rome, I am planning to do some graph computations. This means that I need to transform these line strings into a graph.
The OSMNx package is just right for it – at the intersection of Geospatial data tools and the famous graph analytics library, NetworkX. In particular, I followed this thread to derive a node and an edge table from the original dataset.
# create an edge table
edges = gdf.copy()
edges['u'] = [str(g.coords[0][0]) + '_' + str(g.coords[0][1]) for g in edges.geometry.to_list()]
edges['v'] = [str(g.coords[-1][0]) + '_' + str(g.coords[-1][1]) for g in edges.geometry.to_list()]
edges_copy = edges.copy()
edges['key'] = range(len(edges))
edges = edges.set_index(['u', 'v', 'key'])
edges.head(3)
Result of this cell:
import pandas as pd # version: 1.4.2
from shapely.geometry import Point # version: 1.7.1
# create a node table
nodes = pd.DataFrame(edges_copy['u'].append(edges_copy['v']), columns = ['osmid'])
nodes['geometry'] = [Point(float(n.split('_')[0]), float(n.split('_')[1])) for n in nodes.osmid.to_list()]
nodes['x'] = [float(n.split('_')[0]) for n in nodes.osmid.to_list()]
nodes['y'] = [float(n.split('_')[1]) for n in nodes.osmid.to_list()]
nodes = gpd.GeoDataFrame(nodes)
nodes = nodes.set_index('osmid')
nodes.head(3)
Result of this cell:
Create the graph:
import osmnx as ox # version: 1.0.1
# Now build the graph
graph_attrs = {'crs': 'epsg:4326', 'simplified': True}
G = ox.graph_from_gdfs(nodes, edges[[ 'geometry']], graph_attrs)
print(type(G))
print(G.number_of_nodes()), print(G.number_of_edges())
So here, I managed to transform the GIS data file into a network object with 5122 nodes and 7154 edges. Now, I would like to take a look. One can visualize networks with NetworkX as well. However, I prefer to go for the open-source software Gephi. It offers more flexibility and better visual-fine-tuning options. Let’s transform G into a Gephi-compatible file and export it – in this version, I will work with an unweighted, undirected graph.
# Transform and export the graph
import networkx as nx # version: 2.5
G_clean = nx.Graph()
for u, v, data in G.edges(data=True):
G_clean.add_edge(u, v)
G_clean2 = nx.Graph()
G_clean2.add_edges_from(G_clean.edges(data=True))
nx.write_gexf(G_clean2, 'roman_empire_network.gexf')
Additionally, I create a data datable called coordiantes.csv in which I save the coordinates of each node (crossing) of the road network.
nodes2 = nodes[nodes.index.isin(set(G.nodes))].drop(columns = ['geometry'])
nodes2.index.name = 'Id'
nodes2.to_csv('coordinates.csv')
3. Visualizing the network in Gephi
The exact how-to for visualizing networks in Gephi is worth an entire session in itself, so here, I would instead show the result.
In this visualization, each node corresponds to an intersection, color encodes the so-called network communities (densely interlinked subgraphs), while nodes are sized according to their betweenness centrality. Betweenness is a network centrality measure that quantifies the bridge role of a node. Therefore, the larger a node, the more central it is.
On the visual, its also interesting to see how geography drives the clusters and how surprisingly Italy stands as its own as well, probably because of the much denser internal road network.
4. Network centralities
After enjoying the visuals, let’s get back to the graph itself and quantify it. Here, I will compute the total degree of each node, measuring the number of connections it has, and the unnormalized betweenness centrality of each node, counting the total number of shortest paths crossing each node.
node_degrees = dict(G_clean2.degree)
node_betweenness = dict(nx.betweenness_centrality(G_clean2, normalized = False))
Now, I have the importance scores of each crossing. Additionally, in the nodes table, we also have their location – now it’s time to go for the main question. For this, I quantify the importance of each node falling into the admin boundaries of Rome. For this, I will need the admin boundaries of Rome, which is relatively easy to get from OSMnx (note: Rome today is probably different from Rome back in the day, but approximatively, it should be fine).
admin = ox.geocode_to_gdf('Rome, Italy')
admin.plot()
The output of this cell:
Also, on the visuals, it’s pretty clear that Rome is not there as a single node in the road network; instead, many are nearby. So we ned some sort of binning, spatial indexing, which helps us group all road network nodes and intersections belonging to Rome. Additionally, this aggregation would be desired to be comparable across the Empire. This is why, instead of just mapping nodes into the admin area of Rome, I will go for Uber’s H3 hexagon binning and create hexagon grids. Then, map each node into the enclosing hexagon and compute the aggregated importance of that hexagon based on the centrality scores of the enclosed network nodes. Finally, I will discuss how the most central hexagons overlay with Rome.
First, let’s get the admin area of the Roman Empire in an approximative way:
import alphashape # version: 1.1.0
from descartes import PolygonPatch
# take a random sample of the node points
sample = nodes.sample(1000)
sample.plot()
# create its concave hull
points = [(point.x, point.y) for point in sample.geometry]
alpha = 0.95 * alphashape.optimizealpha(points)
hull = alphashape.alphashape(points, alpha)
hull_pts = hull.exterior.coords.xy
fig, ax = plt.subplots()
ax.scatter(hull_pts[0], hull_pts[1], color='red')
ax.add_patch(PolygonPatch(hull, fill=False, color='green'))
The output of this cell:
Let’s split the Empire’s polygon into a hexagon grid:
import h3 # version: 3.7.3
from shapely.geometry import Polygon # version: 1.7.1
import numpy as np # version: 1.22.4
def split_admin_boundary_to_hexagons(polygon, resolution):
coords = list(polygon.exterior.coords)
admin_geojson = {"type": "Polygon", "coordinates": [coords]}
hexagons = h3.polyfill(admin_geojson, resolution, geo_json_conformant=True)
hexagon_geometries = {hex_id : Polygon(h3.h3_to_geo_boundary(hex_id, geo_json=True)) for hex_id in hexagons}
return gpd.GeoDataFrame(hexagon_geometries.items(), columns = ['hex_id', 'geometry'])
roman_empire = split_admin_boundary_to_hexagons(hull, 3)
roman_empire.plot()
Result:
Now, map the road network nodes into hexagons and attach the centrality scores to each hexagon. Then. I aggregate the importance of each node within each hexagon by summing up their number of connections and the number of shortest paths crossing them:
gdf_merged = gpd.sjoin(roman_empire, nodes[['geometry']])
gdf_merged['degree'] = gdf_merged.index_right.map(node_degrees)
gdf_merged['betweenness'] = gdf_merged.index_right.map(node_betweenness)
gdf_merged = gdf_merged.groupby(by = 'hex_id')[['degree', 'betweenness']].sum()
gdf_merged.head(3)
Finally, combine the aggregated centrality scores with the hexagon map of the Empire:
roman_empire = roman_empire.merge(gdf_merged, left_on = 'hex_id', right_index = True, how = 'outer')
roman_empire = roman_empire.fillna(0)
And visualize it. In this visual, I also add the empty grid as a base map and then color each grid cell based on the total importance of the road network nodes within. This way, the coloring will highlight the most critical cells in green. Additionally, I added the polygon of Rome in white. First, colored by degree:
f, ax = plt.subplots(1,1,figsize=(15,15))
gpd.GeoDataFrame([hull], columns = ['geometry']).plot(ax=ax, color = 'grey', edgecolor = 'k', linewidth = 3, alpha = 0.1)
roman_empire.plot(column = 'degree', cmap = 'RdYlGn', ax = ax)
gdf.plot(ax=ax, color = 'k', linewidth = 0.5, alpha = 0.5)
admin.plot(ax=ax, color = 'w', linewidth = 3, edgecolor = 'w')
ax.axis('off')
plt.savefig('degree.png', dpi = 200)
Result:
Now, colored by betweenness:
f, ax = plt.subplots(1,1,figsize=(15,15))
gpd.GeoDataFrame([hull], columns = ['geometry']).plot(ax=ax, color = 'grey', edgecolor = 'k', linewidth = 3, alpha = 0.1)
roman_empire.plot(column = 'betweenness', cmap = 'RdYlGn', ax = ax)
gdf.plot(ax=ax, color = 'k', linewidth = 0.5, alpha = 0.5)
admin.plot(ax=ax, color = 'w', linewidth = 3, edgecolor = 'w')
ax.axis('off')
plt.savefig('betweenness.png', dpi = 200, bbox_inches = 'tight')
Finally, we arrive at a reassuring conclusion. If color the hexagon cells based on their cumulated degree, the area of Rome wins by far. If we color the hexagons based on betweenness, the image is similar – Rome dominates again. One add-on here is that the highway connecting Rome with the Middle East also emerges as a critical, high-betweenness segment.
tl;dr network science also says that all the roads led to Rome!