Python Hands-on Tutorial
This work has been done entirely using publicly available data and was co-authored with Kai Kaiser. All errors and omissions are those of the author(s).

Mapping information concerning the distribution of people is vital to a host of public policy questions across our planet’s different country settings. The ability to capture the geographic distribution of the population and their key characteristics is integral to measuring exposure to disasters and climate change, and access differentials to key services such as health, and environmental and land-use pressures. Whether for planning, budgeting, or regulatory purposes, sufficiently granular and timely population data for more evidence-based decision-making is necessary.
A new generation of high-resolution population estimate count layers stands to increasingly make a powerful contribution to public sector decision-making, particularly in developing countries. The mapping layers rely on non-traditional methodologies of data collection, including the use of satellite imagery. Consequently, they can provide population estimates for any grid cell on the earth down to 30 meters in resolution. Their latest updates can be accessed online through Application Programming Interfaces (APIs), making them potentially a very valuable asset for data-driven decision-makers.
Some critical limitations of the traditional administrative or statistical population census data are addressed by these high-resolution population maps. Population census data typically lack frequent updates, being undertaken only roughly every ten years by most countries. They are generally presented in tabular administrative classifications, which limits analytics and visualization options compared to more granular grid-based layers. Household-level population census data is rarely collected on a geo-referenced basis, or disclosed at that level. The administrative registers of births and deaths maintained by national and subnational governments are also not always reliable or updated, especially in low and middle-income countries.
Datasets such as the Facebook Research High Resolution Settlement Layer (HRSL) and World Pop, employing new generation techniques for high-resolution population estimates can be readily deployed for a range of descriptive and prescriptive analytics.
WorldPop Project was initiated in 2013 with the goal to provide open access to population and demographic datasets to support development, disaster, and health applications. It integrates neighborhood-scale micro-census surveys undertaken in small areas and national-level satellite imagery and digital mapping. In short, WorldPop leveraged machine learning modeling (random forest) to extrapolate high-resolution national population estimates from the relatively sparse micro census data (including predicting populations in unsurveyed locations) and are available yearly from 2000-2020 (as of November 2020). The gridded population data or raster images are available at a spatial resolution as detailed as 3 arc seconds (approximately 100m at the equator). This temporal availability of high-resolution estimates of population count makes it easier to identify the growth and dynamics of the population across national and regional levels.
Another such collaboration is of Facebook with the Center for International Earth Science Information Network (CIESIN), to use artificial intelligence to identify buildings from satellite imagery and estimate the population at a 30-meter resolution. Adjustments to match the census population with the UN estimates are also applied at the national level. The adjustments are made to match the UN country population estimates for the years 2015 and 2020.
In practice, policymakers may not yet be familiar enough with how to access, analyze, apply, and ultimately adopt these data for their decision-making purposes. Greater familiarity will also help them understand the possible benefits, and applications, but also limitations of these new data resources for their decision-making purposes.
To support data-informed and data-driven decision making, online Jupyter Notebook Python Environments (JPNEs) allows for accessible and replicable ways of realising data analytics and visualisation.
JNPEs integrate programming code, intuitive description, and numeric and visual outputs (cite). When implemented online, they do not require users to install or download any local software. JPNE is not just powerful for delivery work, but above all to facilitate closer collaboration with the domain and public sector experts with the data scientists.
In this blog, we explore the WorldPop Population Counts (Raster format at 100m resolution downloaded as tif file) and High Resolution Population Density Maps from Facebook (Vector format at 30m resolution downloaded as csv) with Python in a JPNE and visualise the population counts at different administrative units for Vietnam.
To extract the estimated population count for different administrative units, we also need data representing the digital boundaries of Vietnam as shapefiles (a simple non-topological format for storing geometric location and attribute information of features represented as a polygon or area).
Thus, this analysis requires three datasets-Population data from WorldPop and Facebook, and Administrative Boundaries data from GADM. The analysis includes 4 steps –
- Load and Explore data on Administrative Boundaries from GADM
- Load, Explore, and Visualize Population data from WorldPop
- Load, Explore, and Visualize Population data from Facebook
- Compare and summarise results
Load and Explore data on Administrative Boundaries from GADM
GADM, the Database of Global Administrative Areas, is a high-resolution database of country administrative areas, the latest version of which delimits 386,735 administrative areas. The country-level data for Vietnam was downloaded which resulted in a folder with the following structure.

The index of the file (0,1,2,3) denotes the administrative level at which the boundaries are available.
Vietnam is divided into fifty-eight provinces and five municipalities under the command of the central government, making a total of 63 polygons under ADM Level 1. The provinces of Vietnam are then subdivided into second-level administrative units, namely districts, provincial cities, and district-level towns. The municipalities are subdivided into rural districts, district-level towns, and urban districts which are further subdivided into wards. The GADM data thus includes 686 units at level 2 and 7658 at level 3 administrative units of Vietnam.
The gadm shapefiles are read with geopandas.
vietnam_administrative_boundaries = geopandas.read_file('Data/gadm36_VNM_shp/gadm36_VNM_3.shp')
vietnam_administrative_boundaries['NAME_0'].unique()
> Vietnam
vietnam_administrative_boundaries['NAME_1'].nunique()
> 63
vietnam_administrative_boundaries['NAME_2'].nunique()
> 686
vietnam_administrative_boundaries['NAME_3'].nunique()
> 7658

Load and Explore Population Data from WorldPop
We downloaded the data of people per pixel (PPP) for Vietnam in Raster format from WorldPop at 100m resolution adjusted to match UN national estimates. We use rasterio, a GDAL, and numpy-based Python library to read the raster data downloaded as a tif file.
vietnam_worldpop_raster = rasterio.open('vnm_ppp_2020_UNadj.tif')
Raster data is any pixelated (or gridded) data where each pixel is associated with a specific geographical location. The value of a pixel can be continuous (e.g. elevation) or categorical (e.g. land use). A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects the data to a particular location. This includes the raster’s extent and cell size, the number of rows and columns, and its coordinate reference system (CRS). A raster dataset contains one or more layers called bands. For example, a color image has three bands (red, green, and blue) while a digital elevation model (DEM) has one band (holding elevation values), and a multispectral image may have many bands.
print('No. of bands:',(vietnam_worldpop_raster.count))
> No. of bands: 1

# Calculating total population of Vietnam
worldpop_raster_nonzero = vietnam_worldpop_raster_tot[vietnam_worldpop_raster_tot>0]
population_worldpop = worldpop_raster_nonzero[worldpop_raster_nonzero > 0].sum()
print(round(population_worldpop/1000000,2),'million')
> 97.34 million
The raster layer gives a total population of 97.34 million in Vietnam. We then mask this raster layer with the polygons extracted from the GADM file to identify population counts within each of the 63 provinces+municipalities (level 1 administrative units) of Vietnam. The following function returns the population count of a raster_layer within a vector_polygon.
The code creates the following result by adding a column called population_count_wp which has the population estimate of the ADM Level 1 based on the WorldPop raster data. We then use the Plotly Choropleth map to visualize the population count using the code snippet below.

Load and Explore Population Data from Facebook
The Facebook population map that estimates the number of people living within 30-meter grid tiles for Vietnam is available for download at HDX either as tif file or as a CSV file. As we did the preprocessing of WorldPop data in tif format, we demonstrate here the CSV file downloaded in the following format.

The CSV file consists of latitude, longitude, and population estimates at the points as of 2015, and 2020. The Facebook data estimates a total population of 98.16 million in Vietnam.
In order to use the geospatial tools and techniques demonstrated with WorldPop data, we need to convert this dataframe into a geodataframe which includes a geometry field.
We then get the population counts per administrative boundaries with the masking function for the vector layer with a polygon.
We then plot the choropleth map with Plotly using the code below

Conclusions
With the two mapping layers now previewed in a JPNE, we now turn to compare results through the more familiar administrative definitions lens familiar to most policymakers. To do this, we can visualize comparative ratios of Worldpop versus Facebook results with the scatterplots presented below. A 45-degree line would suggest that results are identical for any given locality.
At the provincial/municipality level, the population counts with both Worldpop and Facebook show a high correlation.

At the second administrative level, especially in some of the municipalities like Binh Duong and Ho Chi Minh, Facebook gives a relatively smaller population count compared to WorldPop. Whether this is an issue ultimately depends on the question being asked. JPNEs allow for quick reviews of the extent to which using one data source over another makes a substantive difference for the issue at hand.
Digital technologies developments in terms of platforms (e.g., JPNEs) and data (Facebook Research HRSL and World Pop) provide for a powerful combination to address a range of policy questions. But these require practical collaborations between domain specialists (e.g., government officials in planning, finance or health) along with data scientists-engineers/programmers).
This practitioner’s blog was generated as part of the Disruptive Technologies for Public Asset Governance (DT4PAG) program for Vietnam, initiated by the World Bank in Vietnam with the support of the Swiss State Secretariat for Economic Affairs (SECO).

DT4PAG promotes the use of cloud-based and open-source platforms and data, along with practitioner’s learning by doing skills building. to better inform green, including, and resilient development. The views expressed in this note are those of the authors, and all remaining errors and omissions are our own.
The full code for this tutorial can be found in the GitHub repo. Even if you are not a Python programmer, we hope this contribution gives you an intuitive sense of the possibilities and processes for leveraging this type of data for a new generation of decision support.