The world’s leading publication for data science, AI, and ML professionals.

The Battle of Choropleths – Part 1

Using Geopandas to Create Stunning Choropleths

PYTHON. DATA SCIENCE. GEOVISUALIZATION

Image by Author: Choropleth with Diverging Colormap
Image by Author: Choropleth with Diverging Colormap

So far, in our article series, we have been discussing how to create interactive geoplots and we have discussed seven (7) options, covering a wide range of tools that one can use depending on the preferred ease of use and customizability.

The options for choropleths are fewer compared to geoscatterplots and thus for our article series, we will include even the non-interactive ones.

For this part, let us try it with the most basic ones we have: Geopandas.

CHOROPLETHS

To begin, a Choropleth (sometimes called color themes) is a thematic map that uses color scale (color intensity) to represent the values of a region. We used the term region because it often corresponds to an administrative location (cities, provinces, and countries).

As such, the creation of a choropleth map requires that shapefiles are present.

The more intense the colors (at the right of the color scale spectrum), the greater the value represented is for a region. For example, for sequential color scale/colormaps like "Greens", the darker shades of green represent greater values.

From Matplotlib.Org
From Matplotlib.Org

So, when do you prefer to use choropleths over a geoscatterplot?

Choropleths are used when you aim to compare aggregate values among regions. This may be a mean, a median, a minimum, or a maximum for certain values or statistics. A geoscatterplot, on the other hand, represents the dispersion of individual observations, as opposed to aggregate.

It can be used to compare individual observations if a legend is employed (like in our coffee shop case).

Let us try doing a choropleth on the GDP per capita of countries and the benefit of using a choropleth for this is that we’ll immediately see if higher GDP per capita countries exhibit locational proximity.

Let us now proceed to our coding.

PRELIMINARIES

LOADING AND PREPROCESSING OF DATA

For our dataset, we will use the GDP per capita dataset (constant 2015 US$) from the World Bank databank.

PACKAGES

The most important package to install and use is that of geopandas. In my experience, it is easy to install this in Macbook but a little difficult for Windows. In this case, I prepared an article that may help you with this.

For our purpose, we likewise need to install pyproj and mapclassify.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import geopandas as gpd
%matplotlib inline

LOADING SHAPEFILES

For essentially all choropleths in this series, we will be needing the shapefiles of the region for which we will make a choropleth. Shapefiles provide the administrative boundaries of a location and can be as detailed as a province, city, or some customized region.

Since the information we need to map out is for countries, we will be using the world shapefile with country-level geometries.

Shapefiles can be downloaded from GADM.

CODING

### LOAD GDP PER CAPITA DATA
df = pd.read_csv('data/gdp_per_capita.csv', 
            skiprows=4)
df = df.loc[:,['Country Name','Country Code', '2020']] #Choose only 2020
df.head()
Image by the Author: First Five Series of the Dataset
Image by the Author: First Five Series of the Dataset
### LOAD THE SHAPEFILES
gdf = gpd.read_file('shapefiles/world-administrative-boundaries/world-administrative-boundaries.shp')
gdf.head()
Image by the Author: Shapefile of the World. Bonus: You get to see the geometry of a location by calling its geometry data
Image by the Author: Shapefile of the World. Bonus: You get to see the geometry of a location by calling its geometry data
### MERGE DATA
merged = gdf.merge(df, left_on='name', right_on='Country Name' )
merged.head()
Image by the Author: Merged DataFrame for Shapefiles and GDP per Capita
Image by the Author: Merged DataFrame for Shapefiles and GDP per Capita

Important: Not all countries will have the same name as the shapefiles so an effort to synchronize the names are important for this

### PLOT
cmap = 'Greens'
#Create A Figure to Attach Other Objects Later
fig, ax = plt.subplots(figsize = (30,25))
merged.plot(column="2020",
            ax=ax,
                 cmap=cmap, 
#                      k=colors,
                 scheme='quantiles',
                 legend=True)
ax.axis('off')

A few things regarding our code:

  • cmap – colormap.
  • fig, ax – This part is important if we wish to create a colorbar (which is the legend we use for colormaps)
  • scheme – the scheme parameter above requires that we install mapclassify. This helps us better allocate the color map than manually deciding the number of colors the map will use. In most cases, this is helpful as too many colors may be difficult to process.
  • ax.axis('off') – This is the code used to remove the x and y-axis that corresponds to the geocode.
Image by the Author
Image by the Author

CUSTOMIZATION

To further improve our map, we can:

  • Add a colorbar
  • Choose a different color scheme
  • Add text for those that have above USD 100,000 per capita

COLORBAR

The colorbar is the legend that helps determine the possible ranges of value associated with a color or a shade of a color.

# create the colorbar
norm = colors.Normalize(vmin=merged['2020'].min(), vmax=merged['2020'].max())
cbar = plt.cm.ScalarMappable(norm=norm, cmap='Greens')

Note that the code above implements the colorbar from the 2020 the column that we used to bring color into our map.

cax = fig.add_axes([1, 0.1, 0.03, 0.8])
cbr = fig.colorbar(cbar, cax=cax,)
#Remove the legend
ax.get_legend().remove()
#Change the tick for the colorbar
cax.tick_params(labelsize=18)
fig
Image by the Author: Added the colorbar
Image by the Author: Added the colorbar

DIVERGING COLORMAPS

To truly show contrast, diverging colormaps are preferred. In this case, let us try RdYlGn:

#Try a diverging colormap
cmap = 'RdYlGn'
#Create A Figure to Attach Other Objects Later
fig, ax = plt.subplots(figsize = (30,25))
merged.plot(column="2020",
            ax=ax,
                 cmap=cmap, 
#                      k=colors,
                 scheme='quantiles',
                 legend=True)
# ax.legend(loc="best")
ax.axis('off')
# create the colorbar
norm = colors.Normalize(vmin=merged['2020'].min(), vmax=merged['2020'].max())
cbar = plt.cm.ScalarMappable(norm=norm, cmap=cmap)
cax = fig.add_axes([1, 0.1, 0.03, 0.8])
cbr = fig.colorbar(cbar, cax=cax,)
#Remove the legend
ax.get_legend().remove()
#Change the tick for the colorbar
cax.tick_params(labelsize=18)
fig
Image by the Author: Diverging Colormap
Image by the Author: Diverging Colormap

ADDING TEXT TO THAT ABOVE USD 100K PER CAPITA

This customization is not as easy to prettify compared to the other customization but for the sake of those who want it, let us go ahead and place it here anyway.

To add the text, we need to place it in the center of the geometry (or centroid). The problem with this is that some geometries are smaller than others and it is not easy to resize the text for that.

for ind, row in merged.iterrows():
    if merged.loc[ind,'2020']>100000:
        ax.text(row["geometry"].centroid.x,row["geometry"].centroid.y+4, row["name"], fontsize=12, color='blue',weight="bold")
        ax.text(row["geometry"].centroid.x,row["geometry"].centroid.y, row["2020"],
                fontsize=12, color='blue', weight='bold')
fig
Image by Author: Map with Blue Text of Countries with greater than USD 100K for GDP per capita
Image by Author: Map with Blue Text of Countries with greater than USD 100K for GDP per capita

FINAL REMARKS

As we can see, the choropleth did answer our initial question: that the level of GDP per capita does cluster and we see richer countries being close to each other where the same is found for poorer countries (in terms of GDP per capita).

The good news is that for the succeeding packages that we will test out, will use the base codes that we have here already as most of them requires shapefiles (or that geopandas dataframe).

Full code on my Github Page.

OTHER RELATED ARTICLES

The Battle of Interactive Geographic Visualization Part 1 – Interactive Geoplot Using One Line of…

The Battle of Interactive Geographic Visualization Part 2- Interactive Geoplot Using One Line of…

The Battle of Interactive Geographic Visualization Part 3- Plotly Graph Objects (Go)

The Battle of Interactive Geographic Visualization Part 4 – Altair

The Battle of Interactive Geographic Visualization Part 5 – Folium

The Battle of Interactive Geographic Visualization Part 6 – Greppo

The Battle of Interactive Geographic Visualization Part 7 – Bokeh


Related Articles