The world’s leading publication for data science, AI, and ML professionals.

Visualization of One Million Schools Mapped by UNICEF’s Giga Initiative

Making a time lapse animation using Plotly Express in python

One million schools mapped by Giga initiative (Image by author)
One million schools mapped by Giga initiative (Image by author)

Giga initiative http://gigaconnect.org, a joint program between UNICEF and ITU aims at providing internet connection to every school in the world. To achieve this goal, our data science team at Giga has been working to collect accurate location information of schools.

On the 1st of November of 2021, we have achieved the one million mark of the schools mapped in our system https://projectconnect.unicef.org/map. To cerebrate Giga team’s achievement, we have prepared a special visualization of the schools mapped over time from the beginning and until we achieved the one million mark.

For the map visualization we used Plotly Express. Plotly Express provides a function to make time lapse animations.

#import all libraries
import pandas as pd
import geopandas as gpd
import numpy as np
import plotly.express as px
from datetime import datetime, timedelta
import os
import ffmpeg

However, since our dataset of school location data is over 1 million points our first try using it to make an animation have failed. Instead, we took an approach to use Plotly Express to ‘render’ a frame for the time lapse animation at a time and finally to use ffmpeg to combined the rendered frames into one MP4 file.

location_file = 'allschool_for_visualization.csv'
df = pd.read_csv(location_file)
df.head()
information for schools from out dataset
information for schools from out dataset

For the background, we used mapbox’s dark map which requires an api key

px.set_mapbox_access_token('your api key')

Next we prepared annotation to show the date and the numbers of schools mapped.

annotation = {
    'xref': 'paper',  
    'yref': 'paper',  
    'x': 0.1,  
    'y': 0.1,  
    'text': '',
    'showarrow': False,
    'arrowhead': 0,
    'font': {'size': 50, 'color': 'white'}
}

Since it takes a long time to render one frame, we had to use multiprocessing with higher CPUs and memories (16 CPUs and 64 GB memory).

import multiprocess as mp
import tqdm

The start date, which would be the date for the first frame was gathered from taking the minimum from the ‘date’ field. The end date was gathered in the same manner

start_date = min(df['date'])
end_date = max(df['date'])

Then a function was defined to render a frame by date

def write_imgs(days):
    try:
        date = start_date + timedelta(days) 
        date_str = str(date.year).zfill(4)+'-'+str(date.month).zfill(2)+'-'+str(date.day).zfill(2)
        annotation['text'] = date_str + '    schools mapped: ' + str(len(df[df['date']<date]))
fig = px.scatter_mapbox(df[df['date']<date], center={'lat':0,'lon':0},lat="lat", lon="lon", zoom=2,opacity=0.5, size='size', size_max= 1.5, mapbox_style='dark',
                title = 'schools')
        fig.update_layout(width = 1920, height=1080, margin={"r":0,"t":0,"l":0,"b":0})
        fig.add_annotation(annotation)
        fig.write_image(f"frames2/frame_{days:04d}.png")
return(1)
    except:
        return(0)

The function was called by multiprocessing pool with the array of dates as input parameter

p = mp.Pool(processes=8)
arr_days=range((end_date - start_date).days)
results = list(tqdm.tqdm(p.imap_unordered(write_imgs, arr_days), total=len(arr_days)))

Finally, ffmpeq was called to combine all frames into one mp4 file.

os.system("ffmpeg -i frames/frame_%4d.png -vcodec libx264 -crf 25 -pix_fmt yuv420p -r 30 -s 1920x1080 ./one_million_schools.mp4")

Check the results in the below link!

Giga on LinkedIn: Giga maps 1 million schools


Related Articles