A Beginners Guide to Create a Choropleth Map in Python using GeoPandas and Matplotlib

M. Rake Linggar A.
Towards Data Science
5 min readSep 16, 2019

--

Data visualization is an important skill when searching and presenting important insights. There are many visuals that can be used to present your data. One of the most interesting data visuals is the choropleth map.

What’s a choropleth map?

A choropleth map (from Greek χῶρος “area/region” and πλῆθος “multitude”) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income.
Source: https://en.wikipedia.org/wiki/Choropleth_map

Google Image search — choropleth

Why is this such an interesting visual? a) it is pretty, b) it tells us the data that we are interested in exactly the location of where it is associated in, and c) it is pretty!

With the introductions done, let’s get down to the code (and the preparations for it).

Step 1: Install required Python libraries

Let’s install several packages that we’ll need for this exercise. GeoPandas is an amazing package that takes pandas's DataFrame to the next level by allowing it to parse geospatial data. It will use Descartes to generate a Matplotlib plot.

pip install descartes
pip install geopandas
pip install matplotlib
pip install numpy
pip install pandas

Step 2: Get the data

There are two kinds of data that we will need for this exercise:

  1. The data will be mapped to locations/regions/areas/etc. There are plenty of open data that we can use freely, one if it is from Wikipedia (data accuracy and validity not guaranteed, but for this learning exercise, no problem!). For this exercise, we will use the number of cities and regions in each province in Indonesia. You can check it here if you’d like, but I would advise you to just download it from my Github repo as there needs some manual work on it (changing the column name, data format, etc.) which we will not bother to discuss here.
  2. The second data is a shapefile of the map that we want to make. It is basically a list of geometric locations (either in points, lines, or polygons). Since we want to map Indonesia’s provinces, we will download Indonesia’s Administration area here, or again, in my Github repo.

Step 3: Begin to code

Now, let’s jump right into the code

  1. Load the necessary libraries
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt

2. Load and view the shapefile data

fp = "IDN_adm/IDN_adm1.shp"
map_df = gpd.read_file(fp)
# check the GeoDataframe
map_df.head()

Ok, so as you can see we have several data fields in the downloaded shapefile. The ones that we are interested in are the column NAME_1 (province name) and geometry (the shape of the province). And as you can see as well, the shapefile stores the location information in the form of polygons. Let’s plot it, shall we

map_df.plot()

So we have the map of Indonesia, but it looks too small, let’s resize it

plt.rcParams['figure.figsize'] = [50, 70] #height, width
map_df.plot()

Much better,

3. Load the province data

province = pd.read_csv("data_province.csv", sep=";")
province.head()

As you can see, we have the provinces, 2015 population, number of cities, and several other interesting numbers. All we have to do now is to merge the data with the shapefile and we can begin visualizing these numbers

4. Merge and show the map

# join the geodataframe with the csv dataframe
merged = map_df.merge(province, how='left', left_on="NAME_1", right_on="province")
merged = merged[['province', 'geometry', 'population_2015', 'area_km2', 'population_density_per_km2', \
'cities_regencies', 'cities', 'regencies']]
merged.head()

Cool, we have the data in the cleanest format, let’s make the plot

# set the value column that will be visualised
variable = 'cities_regencies'
# set the range for the choropleth values
vmin, vmax = 0, 50
# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(30, 10))
# remove the axis
ax.axis('off')
# add a title and annotation
ax.set_title('# of Cities per each Region', fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.annotate('Source: Wikipedia - https://en.wikipedia.org/wiki/Provinces_of_Indonesia', xy=(0.6, .05), xycoords='figure fraction', fontsize=12, color='#555555')
# Create colorbar legend
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=vmin, vmax=vmax))
# empty array for the data range
sm.set_array([]) # or alternatively sm._A = []. Not sure why this step is necessary, but many recommends it
# add the colorbar to the figure
fig.colorbar(sm)
# create map
merged.plot(column=variable, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')

Pretty good, right? But it can be better. For instance, we know which locations have a high number of cities per region and which have a low number of cities per region. To make the plot clearer, let’s add the province labels to it. Add the following code at the bottom of the code above.

# Add Labels
merged['coords'] = merged['geometry'].apply(lambda x: x.representative_point().coords[:])
merged['coords'] = [coords[0] for coords in merged['coords']]
for idx, row in merged.iterrows():
plt.annotate(s=row['province'], xy=row['coords'],horizontalalignment='center')

Ok, that is better. If we take a closer look, we can see that, according to Wikipedia, Jawa Tengah province has a very high number of cities per region compared to other provinces.

Another tweak we can do is to change the orientation of the color map legend to be horizontal, in case you want the upper space to focus on the map.

Just change this code

fig.colorbar(sm)

To this

fig.colorbar(sm, orientation="horizontal", fraction=0.036, pad=0.1, aspect = 30)
Final tweak

5. Save it!

Now that we have made the choropleth map, the last thing we’ll need to do is to save it in a friendly popular format, like .png

fig.savefig(‘map.png’, dpi=300)

That’s all for now, hope this post is useful.

Try it yourself

You can download the Notebook file for this exercise along with the cleaned province data and Indonesian shapefiles in my Github repo here.

Credits

This post is inspired by Benjamin Cooley’s post.

I was interested in learning and doing something further and loved the results, hence I am sharing it in my own post here. :)

--

--

Love anything data-related (especially programming), travel, movies, and gaming.