The world’s leading publication for data science, AI, and ML professionals.

Animating Spatial Movement in Python

How to turn an origin-destination matrix into a mesmerizing animation

Static map of movements from bike sharing data. Image by author.
Static map of movements from bike sharing data. Image by author.

Spatial data is inherently visual, and the advances in visualizing (geo-)spatial data in Python has made it very easy to quickly plot maps of all shapes and forms. Even creating animations of charts and simple maps is quite easily possible. Choropleth maps in particular, with static polygons and changing colours, there are ready-made functions to do.

But, when it comes to movement data and animating lines, the task is a bit more cumbersome. Here I will try to give an example of how I have tried to solve animating spatial movement data in Python.

Initial Data

To start, we need some (line-)data with timestamps; in this example I will use Bike Sharing data from the bike sharing system in Oslo, Norway. The data is openly available under the Norwegian Licence for Open Government Data (NLOD) 2.0/Open Government License, from the homepage of Oslo Bysykkel.

import geopandas as gpd
import pandas as pd

# Import data from csv
data = pd.read_csv("https://data.urbansharing.com/oslobysykkel.no/trips/v1/2023/10.csv")
data = data[['started_at','ended_at','duration', 'start_station_latitude', 'start_station_longitude','end_station_latitude', 'end_station_longitude']]
# Subset the data to only one day
data['start_day'] = data['started_at'].apply(lambda x: int(x[8:11]))
data = data[data["start_day"]==day]
data
The first five rows of the initial dataset. Image by author.
The first five rows of the initial dataset. Image by author.

As the data consist of the start and end points of the trips, we need to create a line between the points, and for that we can use the implementation of Dijkstra’s algorithm in NetworkX.

Creating movement lines

Before we can create the movement lines, we need a network of streets that we can use to for a shortest path calculation. With osmnx we can retrieve a bike network from OpenStreetMap for the area we are interested in. We will use the extent of the bike trip data as our study area.

import osmnx as ox
# Create a GeoDataFrame from the stations
initial_data = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data['start_station_longitude'],data['start_station_latitude']),crs="EPSG:4326")
# Get the total bounds
total_bounds = initial_data.total_bounds
# Create a list for the polygon
bounding_box = [[total_bounds[0],total_bounds[3]], [total_bounds[0],total_bounds[1]], [total_bounds[2],total_bounds[1]], [total_bounds[2],total_bounds[3]]]
# Create the polygon
bounding_box_polygon = Polygon(bounding_box)
# Buffer the polygon to include roads just outside of the stations
graph_extent = bounding_box_polygon.buffer(0.02)
# Create the graph
area_graph = ox.graph_from_polygon(graph_extent, network_type='bike')
nodes, edges = ox.graph_to_gdfs(area_graph)

To calculate the shortest path with Dijkstra’s Algorithm we can create a function which we will apply to our dataframe:

import networkx as nx
from shapely import ops

def calculate_shortest_path(start_point,end_point, graph):
    # Find the nearest node to the start and end points
    start_node = ox.distance.nearest_nodes(graph, X=start_point.x, Y=start_point.y)
    end_node = ox.distance.nearest_nodes(graph, X=end_point.x, Y=end_point.y)

    # Calculate the shortest path
    path = nx.dijkstra_path(graph, source=start_node, target=end_node, weight='length')

    # Convert the network path to a LineString
    geoms = [edges.loc[(u, v, 0), 'geometry'] for u, v in zip(path[:-1], path[1:])]
    route_lineString = MultiLineString(geoms)
    route_lineString = ops.linemerge(route_lineString)
    return route_lineString

We can now apply the function to the dataframe. To do so, we create Shapely Points for the start and end points, and then use these in function to calculate the path between them:

data['start_point'] = data.apply(lambda row: Point(row['start_station_longitude'],row['start_station_latitude']), axis=1)
data['end_point'] = data.apply(lambda row: Point(row['end_station_longitude'],row['end_station_latitude']), axis=1)
data['shortest_path'] = data.apply(lambda row: calculate_shortest_path(row['start_point'],row['end_point'],area_graph), axis=1)
print(data.head())
Dataset now with shortest path. Image by author.
Dataset now with shortest path. Image by author.

Before we start splitting the lines up into points along the path, we can prepare the time data so that it can be displayed nicely in our final animation:

from datetime import datetime
data['started_at'] = data['started_at'].apply(lambda x: datetime.strptime(x[0:19], "%Y-%m-%d %H:%M:%S"))
data['ended_at'] = data['ended_at'].apply(lambda x: datetime.strptime(x[0:19], "%Y-%m-%d %H:%M:%S"))
One of the rows calculated into the shortest path between the start and end point. Image by author.
One of the rows calculated into the shortest path between the start and end point. Image by author.

Splitting up lines

Now that we have our movement lines, the idea is to create a dot along this line at a set interval so that we can plot the points in order based on the time.

The following line_to_points()-function goes through each of the lines in our dataframe and splits the line up into segments based on the duration of the bike trips. In this example, each line is split up after 15 seconds and a point is created at that location.

def line_to_points(df):
# Create a dataframe for our points
 point_df = pd.DataFrame(columns=['x','y','time','dateminute','size'])
# Iterate over the line data
 for idx, line in df.iterrows():
  start_x = line['shortest_path'].coords.xy[0][0]
  start_y = line['shortest_path'].coords.xy[1][0]
  # Create number of sections based on duration of trip
  delta = line['ended_at']-line['started_at']
  sections = line['duration']/15
  time_gap = delta/sections
  # Create initial point
  point_time = line['started_at']
  size = 10
  # Create timestamp
  dateminute = int(str(line['started_at'].year)+str(line['started_at'].month).zfill(2)+str(line['started_at'].day).zfill(2)+str(line['started_at'].hour).zfill(2)+str(line['started_at'].minute).zfill(2))
  append_list = [start_x,start_y,point_time.strftime("%Y-%m-%d %H:%M:%S"),dateminute,size]
  point_series = pd.Series(append_list,index=point_df.columns)
  point_df = point_df.append(point_series, ignore_index=True)
  # Iterate over rest of the sections, size relative to time
  i = 1
  while i <=sections:
   size = 50
   point_time += time_gap
   this_section = i/sections
   new_point = line['shortest_path'].interpolate(this_section, normalized=True)
   dateminute = int(str(point_time.year)+str(point_time.month).zfill(2)+str(point_time.day).zfill(2)+str(point_time.hour).zfill(2)+str(point_time.minute).zfill(2))
   if i == (sections-1):
    size = 30
   if i == 1:
    size = 30
   if i == (sections):
    size = 10
   point_series = pd.Series([new_point.x,new_point.y,point_time.strftime("%Y-%m-%d %H:%M:%S"),dateminute,size],index=point_df.columns)
   point_df = point_df.append(point_series, ignore_index=True)
   i+=1
 return point_df

With the line_to_points()-function we create a new dataframe with only the points we want to animate with a timestamp that we can iterate over when we do the plotting.

# Create a GeoDataFrame from the paths
paths_gdf = gpd.GeoDataFrame(data, geometry='shortest_path', crs="EPSG:4326")
paths_gdf = paths_gdf[~paths_gdf.is_empty] 
result = line_to_points(paths_gdf)
Lines converted into timestamped points. Image by author.
Lines converted into timestamped points. Image by author.

Animating the movements

To be able to plot the new point data we create a geodataframe which contain the timestamp, size of the point, and the coordinates for each point.

# Create geodataframe
gdf = gpd.GeoDataFrame(result, geometry=gpd.points_from_xy(result['x'], result['y']))
gdf['size'] = gdf['size'].astype(float)
gdf.crs = "EPSG:4326"
gdf = gdf.to_crs(epsg=4326)
# Get list of timestamps
times = list(gdf['dateminute'].unique())
times.sort()

The essential goal here is now to create a plot for each timestamp in the new geodataframe and late combine all these plots into a GIF-animation. To do that, we create a function which will plot a single row in the geodataframe, and at the same time plot all the previous rows with a small dot size. That way we get the largest dot for the current time while still seeing the previous dots, giving an impression of the entire path up to that point.

import matplotlib
import matplotlib.pyplot as plt
import contextily as ctx

def plot_minute(minute):
  # Set up the plot parameter
  matplotlib.rcParams.update({'font.size': 16})
  fig, ax = plt.subplots(ncols = 1, figsize=(32,20))
  # Plot all rows before the current minute
  old_minutes = gdf[gdf['dateminute'] < minute]
  old_minutes.to_crs(epsg=4326).plot(ax=ax, color='#1DA1F2',markersize=5, edgecolor=None, linewidth=0, alpha=0.4) # 2 - Projected plot
  # Select and plot the current minute
  minute_gdf = gdf[gdf['dateminute'] == minute]
  minute_gdf.to_crs(epsg=4326).plot(ax=ax, color='#1DA1F2',markersize=minute_gdf['size'], edgecolor=None, linewidth=0.3, alpha=0.8) # 2 - Projected plot
  # Set common boundaries for the plot
  xlim = ([stations_gdf.total_bounds[0],  stations_gdf.total_bounds[2]])
  ylim = ([stations_gdf.total_bounds[1],  stations_gdf.total_bounds[3]])
  ax.set_xlim(xlim)
  ax.set_ylim(ylim)
  # Set time variables
  mi = str(minute)[-2:]
  h = str(minute)[-4:-2]
  #d = str(minute)[-4:-2]
  m = str(minute)[4:6]
  y = str(minute)[:4]
  # Add a basemap
  ctx.add_basemap(ax,crs=minute_gdf.crs.to_string(), source=ctx.providers.CartoDB.DarkMatter)
  ax.set_axis_off()
  # Create text
  ax.text(.5,.9,f'{h}:{mi} - {day}/{m}/{y}',
        horizontalalignment='center',color='white',
        transform=ax.transAxes,size=18)
  plt.tight_layout()
  # Save plot to file
  plt.savefig(f'animation/{minute}.png',transparent=True, dpi=100)
  plt.close()

We can now use the list of timestamps that we created before, times, and iterate over each timestamp and run the function plot_minute() to make a map for each of the timestamps of data we have.

for timestamp in times:
  plot_minute(timestamp)
One of the plotted maps at a single minute. Image by author.
One of the plotted maps at a single minute. Image by author.

Converting to GIF

The result of running plot_minute() for all our timestamps is that we now have a folder full of .png maps. There are several ways to animate a sequence of pngs, and on a UNIX-based system you can run the convert program in bash. Depending on the parameters set for the plots, the resulting file sizes of the pngs and the animation itself can get very large, so it might be necessary to resize to reduce the file size.

convert -resize 20% -delay 5 -loop 0 animation/*.png animation.gif

Now we can take a look at the final result:

Conclusion

In this article I have shown you one way of turning a points dataset of mobility into an animated map based on the shortest path algorithm, using just a few common Python libraries.

The solution will work for any origin-destination matrix with an accompanying network and bike sharing data is accessible for many cities around the world. Hopefully it can inspire you to explore bike sharing patterns and urban structure in other cities!


Related Articles