PYTHON. DATA SCIENCE. GEOVISUALIZATION

So far, in our article series, we have been discussing how to create interactive geoplots and we have discussed seven (7) options, covering a wide range of tools that one can use depending on the preferred ease of use and customizability.
The options for choropleths are fewer compared to geoscatterplots and thus for our article series, we will include even the non-interactive ones.
For this part, let us try it with the most basic ones we have: Geopandas.
CHOROPLETHS
To begin, a Choropleth (sometimes called color themes) is a thematic map that uses color scale (color intensity) to represent the values of a region. We used the term region because it often corresponds to an administrative location (cities, provinces, and countries).
As such, the creation of a choropleth map requires that shapefiles are present.
The more intense the colors (at the right of the color scale spectrum), the greater the value represented is for a region. For example, for sequential color scale/colormaps like "Greens", the darker shades of green represent greater values.

So, when do you prefer to use choropleths over a geoscatterplot?
Choropleths are used when you aim to compare aggregate values among regions. This may be a mean, a median, a minimum, or a maximum for certain values or statistics. A geoscatterplot, on the other hand, represents the dispersion of individual observations, as opposed to aggregate.
It can be used to compare individual observations if a legend is employed (like in our coffee shop case).
Let us try doing a choropleth on the GDP per capita of countries and the benefit of using a choropleth for this is that we’ll immediately see if higher GDP per capita countries exhibit locational proximity.
Let us now proceed to our coding.
PRELIMINARIES
LOADING AND PREPROCESSING OF DATA
For our dataset, we will use the GDP per capita dataset (constant 2015 US$) from the World Bank databank.
PACKAGES
The most important package to install and use is that of geopandas. In my experience, it is easy to install this in Macbook but a little difficult for Windows. In this case, I prepared an article that may help you with this.
For our purpose, we likewise need to install pyproj
and mapclassify
.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import geopandas as gpd
%matplotlib inline
LOADING SHAPEFILES
For essentially all choropleths in this series, we will be needing the shapefiles of the region for which we will make a choropleth. Shapefiles provide the administrative boundaries of a location and can be as detailed as a province, city, or some customized region.
Since the information we need to map out is for countries, we will be using the world shapefile with country-level geometries.
Shapefiles can be downloaded from GADM.
CODING
### LOAD GDP PER CAPITA DATA
df = pd.read_csv('data/gdp_per_capita.csv',
skiprows=4)
df = df.loc[:,['Country Name','Country Code', '2020']] #Choose only 2020
df.head()

### LOAD THE SHAPEFILES
gdf = gpd.read_file('shapefiles/world-administrative-boundaries/world-administrative-boundaries.shp')
gdf.head()

### MERGE DATA
merged = gdf.merge(df, left_on='name', right_on='Country Name' )
merged.head()

Important: Not all countries will have the same name as the shapefiles so an effort to synchronize the names are important for this
### PLOT
cmap = 'Greens'
#Create A Figure to Attach Other Objects Later
fig, ax = plt.subplots(figsize = (30,25))
merged.plot(column="2020",
ax=ax,
cmap=cmap,
# k=colors,
scheme='quantiles',
legend=True)
ax.axis('off')
A few things regarding our code:
- cmap – colormap.
- fig, ax – This part is important if we wish to create a colorbar (which is the legend we use for colormaps)
- scheme – the scheme parameter above requires that we install
mapclassify
. This helps us better allocate the color map than manually deciding the number of colors the map will use. In most cases, this is helpful as too many colors may be difficult to process. ax.axis('off')
– This is the code used to remove the x and y-axis that corresponds to the geocode.

CUSTOMIZATION
To further improve our map, we can:
- Add a colorbar
- Choose a different color scheme
- Add text for those that have above USD 100,000 per capita
COLORBAR
The colorbar is the legend that helps determine the possible ranges of value associated with a color or a shade of a color.
# create the colorbar
norm = colors.Normalize(vmin=merged['2020'].min(), vmax=merged['2020'].max())
cbar = plt.cm.ScalarMappable(norm=norm, cmap='Greens')
Note that the code above implements the colorbar from the 2020
the column that we used to bring color into our map.
cax = fig.add_axes([1, 0.1, 0.03, 0.8])
cbr = fig.colorbar(cbar, cax=cax,)
#Remove the legend
ax.get_legend().remove()
#Change the tick for the colorbar
cax.tick_params(labelsize=18)
fig

DIVERGING COLORMAPS
To truly show contrast, diverging colormaps are preferred. In this case, let us try RdYlGn
:
#Try a diverging colormap
cmap = 'RdYlGn'
#Create A Figure to Attach Other Objects Later
fig, ax = plt.subplots(figsize = (30,25))
merged.plot(column="2020",
ax=ax,
cmap=cmap,
# k=colors,
scheme='quantiles',
legend=True)
# ax.legend(loc="best")
ax.axis('off')
# create the colorbar
norm = colors.Normalize(vmin=merged['2020'].min(), vmax=merged['2020'].max())
cbar = plt.cm.ScalarMappable(norm=norm, cmap=cmap)
cax = fig.add_axes([1, 0.1, 0.03, 0.8])
cbr = fig.colorbar(cbar, cax=cax,)
#Remove the legend
ax.get_legend().remove()
#Change the tick for the colorbar
cax.tick_params(labelsize=18)
fig

ADDING TEXT TO THAT ABOVE USD 100K PER CAPITA
This customization is not as easy to prettify compared to the other customization but for the sake of those who want it, let us go ahead and place it here anyway.
To add the text, we need to place it in the center of the geometry (or centroid). The problem with this is that some geometries are smaller than others and it is not easy to resize the text for that.
for ind, row in merged.iterrows():
if merged.loc[ind,'2020']>100000:
ax.text(row["geometry"].centroid.x,row["geometry"].centroid.y+4, row["name"], fontsize=12, color='blue',weight="bold")
ax.text(row["geometry"].centroid.x,row["geometry"].centroid.y, row["2020"],
fontsize=12, color='blue', weight='bold')
fig

FINAL REMARKS
As we can see, the choropleth did answer our initial question: that the level of GDP per capita does cluster and we see richer countries being close to each other where the same is found for poorer countries (in terms of GDP per capita).
The good news is that for the succeeding packages that we will test out, will use the base codes that we have here already as most of them requires shapefiles (or that geopandas dataframe).
Full code on my Github Page.