
This article was largely inspired by the subreddit r/peopleliveincities. Humans have spread themselves far and wide across Planet Earth, touching every continent and casting their impact across vast swathes of the world. However when you actually look at population density maps you tend to find that the majority of people cluster themselves together, in spite of the vast, empty spaces around the world. If the population of the world was spread evenly then there would be roughly 50 people per square kilometre. In reality there are areas that are completely unpopulated and entire countries with population densities of thousands of people per square kilometre. This is a genuinely fascinating concept and with the right population density map you can pick out individual cities, towns and even villages as well as identify the worlds truly remote areas. This is what you are seeing in the cover image, dark hotspots denoting densely populated cities and towns mixed amongst lighter areas with very few people. So in this article I am going to outline how you can use open source population density data to build your own population density maps using Python.
Data Preparation
There are a lot of population density datasets out there however for the purpose of this exercise we will use the GHSL — Global Human Settlement Layer. The data is freely available and terms of service can be found here.
The first thing to do is to download the data to wherever you like to do your data visualisations. The data is available here and the specific file needed is the GHS population, 30 arcsec dataset from 2018. It is important to note that this tutorial will require the WGS84 projection. The data comes in the form of a tif file and can be read with rasterio.
This dataset is large and working on a standard laptop will require long execution times so it is important to do some basic exploration to get a sense of what you are dealing with.
The units of the data are 30 arc seconds and each tile contains the number of people recorded for that tile. So the units are number of people per square kilometre (30 arc seconds = 1km).
The max value is 459434.69 people per square km. It is important to note that while this value is larger than the values quoted following a google search for cities or countries with the largest population density, this corresponds to a single km and values for cities and countries are average values. For example, the population density in central park in NY is very different to that of some of the residential districts.
The min values is -200 which corresponds to null values. No data and no people are going to be treated as the same thing in this map so below any values below 0 will be set to zero. The first thing that I thought I would try is a log plot. I try to avoid log plots if possible because the resulting plot can be hard to interpret and the data is often meaningless. However, log plots are good when there is a small population of values orders of magnitude greater than the majority (cities) and also in plots where you are trying to show multiplicative factors (Himalayan mountains vs Shanghai). Our data fits into both of these categories so it is appropriate in this case.
Plotting
We are going to use a hot
colour map. I want the bottom of the colourmap to be a neutral background colour so below I am taking the hot colourmap object, extracting 460 colours from it as a list, adding a background colour to the start of that list and then building a new colourmap from that list. Because the background colour has been added as the first colour in the colourmap, the smallest values (0 in this case) will be mapped to that colour.
The data can then be plotted below using a log scale and the colourmap that has just been created. The data looks largely as expected however it is very messy and not particularly appealing. The reason for this is that humans tend to live together so there is a large population of high density values in the data as well as a large population of low density values. So the the high density values will be mapped to one end of the colourmap while the low density values are mapped to the other end. The result is a similar colour for all areas we might expect to show up as "populated".

Population density maps tend to have custom scales. For example, one of the first results on google images uses the scale 0, 1, 5, 10, 20, 50, 100, 200, 1000 people per square kilometre so in this case they are just treating all areas with a high population as the same. These maps are qualitative, not quantitative because at the end of the day, we want to produce an image that is thought provoking and eye catching. Not an image that will be used to plan pandemic evacuation plans. So with a bit of experimentation I have settled on a slightly different set of values to map our population values to. Objects that use colormaps in matplotlib
by default linearly map the colours in the colormap from the minimum value in the dataset to the maximum value in the dataset at discrete intervals determined by the number of colours in the colourmap. The BoundaryNorm
class allows you to map colours in a colourmap to a set of custom values. In the code below we are creating a colourmap with 10 colours and a BoundaryNorm
object which will map the values in our data to the colourmap according to the pre-defined levels.

This image is starting to look much more like it and zooming in shows off exactly what I think we would expect to find. However that is difficult to do through this particular communication medium and you may decide that you want to generate population maps for specific countries. So below we will make population maps for specific regions of the world. To do this we need to isolate parts of the population data tif specific to a country. Luckily, rasterio
has a useful method to clip rasters based upon a georeferenced shape, e.g. a polygon. In this case I have used the polygons which are part of the NaturalEarth dataset to clip the raster using the rasterio.mask.mask
function.
I want to start with western Europe, so I need to build a GeoDataFrame
of polygons corresponding to those countries. I loaded the NaturalEarth shapefile and extracted the polygons for the UK, France, Austria, Germany, Czechia, Italy, Denmark, Luxembourg, Belgium, Switzerland, Spain, Portugal, the Netherlands and Ireland using geopandas
. Unfortunately France, Spain, Portugal and the Netherlands all have overseas territories which are part of their NaturalEarth geometries. So the code below extracts the part of the mainland Europe parts of those geometries and drops the rest.
The mask
function can then be used to match the raster data to the western Europe polygon. It takes the tif_file
object created at the beginning and a list of mapped geometries and yields a new raster, excluding values outside of the geometries.

For fun, below are some maps produced for the USA, China, India and Egypt. Countries with fairly interesting and unique population densities.




Conclusions
There we have it, a beautiful map showing the how to generate eye catching data visualisations showing where people live on Planet Earth. This is the first of many articles planned to show how to make geospatial data look amazing with Python, please subscribe so you don’t miss them. I also love feedback so please let me know how you would do it differently or suggest changes to make it look even more awesome. I post data visualisations every week on my twitter account, have a look if geospatial data vis is your thing https://twitter.com/PythonMaps
The extra pennies that medium provides really helps to keep this project going so if you are a fan and fancy that cancelling your account and signing up again using my referral, that would be great. https://pythonmaps.medium.com/membership
References
GHS Dataset – Schiavina, Marcello; Freire, Sergio; MacManus, Kytt (2019): GHS population grid multitemporal (1975–1990-2000–2015), R2019A. European Commission, Joint Research Centre (JRC)
Concept & Methodology:— Freire, Sergio; MacManus, Kytt; Pesaresi, Martino; Doxsey-Whitfield, Erin; Mills, Jane (2016): Development of new open and free multi-temporal global population grids at 250 m resolution. Geospatial Data in a Changing World; Association of Geographic Information Laboratories in Europe (AGILE). AGILE 2016