Giga initiative http://gigaconnect.org, a joint program between UNICEF and ITU aims at providing internet connection to every school in the world. To achieve this goal, our data science team at Giga has been working to collect accurate location information of schools.
On the 1st of November of 2021, we have achieved the one million mark of the schools mapped in our system https://projectconnect.unicef.org/map. To cerebrate Giga team’s achievement, we have prepared a special visualization of the schools mapped over time from the beginning and until we achieved the one million mark.
For the map visualization we used Plotly Express. Plotly Express provides a function to make time lapse animations.
#import all libraries
import pandas as pd
import geopandas as gpd
import numpy as np
import plotly.express as px
from datetime import datetime, timedelta
import os
import ffmpeg
However, since our dataset of school location data is over 1 million points our first try using it to make an animation have failed. Instead, we took an approach to use Plotly Express to ‘render’ a frame for the time lapse animation at a time and finally to use ffmpeg to combined the rendered frames into one MP4 file.
location_file = 'allschool_for_visualization.csv'
df = pd.read_csv(location_file)
df.head()

For the background, we used mapbox’s dark map which requires an api key
px.set_mapbox_access_token('your api key')
Next we prepared annotation to show the date and the numbers of schools mapped.
annotation = {
'xref': 'paper',
'yref': 'paper',
'x': 0.1,
'y': 0.1,
'text': '',
'showarrow': False,
'arrowhead': 0,
'font': {'size': 50, 'color': 'white'}
}
Since it takes a long time to render one frame, we had to use multiprocessing with higher CPUs and memories (16 CPUs and 64 GB memory).
import multiprocess as mp
import tqdm
The start date, which would be the date for the first frame was gathered from taking the minimum from the ‘date’ field. The end date was gathered in the same manner
start_date = min(df['date'])
end_date = max(df['date'])
Then a function was defined to render a frame by date
def write_imgs(days):
try:
date = start_date + timedelta(days)
date_str = str(date.year).zfill(4)+'-'+str(date.month).zfill(2)+'-'+str(date.day).zfill(2)
annotation['text'] = date_str + ' schools mapped: ' + str(len(df[df['date']<date]))
fig = px.scatter_mapbox(df[df['date']<date], center={'lat':0,'lon':0},lat="lat", lon="lon", zoom=2,opacity=0.5, size='size', size_max= 1.5, mapbox_style='dark',
title = 'schools')
fig.update_layout(width = 1920, height=1080, margin={"r":0,"t":0,"l":0,"b":0})
fig.add_annotation(annotation)
fig.write_image(f"frames2/frame_{days:04d}.png")
return(1)
except:
return(0)
The function was called by multiprocessing pool with the array of dates as input parameter
p = mp.Pool(processes=8)
arr_days=range((end_date - start_date).days)
results = list(tqdm.tqdm(p.imap_unordered(write_imgs, arr_days), total=len(arr_days)))
Finally, ffmpeq was called to combine all frames into one mp4 file.
os.system("ffmpeg -i frames/frame_%4d.png -vcodec libx264 -crf 25 -pix_fmt yuv420p -r 30 -s 1920x1080 ./one_million_schools.mp4")
Check the results in the below link!