Creating Choropleth Maps with Python’s Folium Library
How to make choropleths with different data structures in Python
Choropleth maps are used to show the variations in data over geographic regions (Population Education). I’ve used choropleths to show the number of available rental apartments across ZIP Codes in New York City and to show the number of mortgage transactions per ZIP Code over a given period. Python’s Folium library enables users to build multiple kinds of custom maps, including choropleths, which you can share as .html
files with external users who do not know how to code.
Loading and Reviewing Geographic Data
U.S. government websites often have the geographic data files necessary to create maps. NYC’s OpenData site and the U.S. Census Bureau’s site have geographic boundary files available in multiple datatypes. Python allows you to load multiple filetypes, including GeoJSON (.geojson
) files and shapefiles (.shp
). These files contain the spatial boundaries of a given location.
Folium’s documentation for the folium.Choropleth()
method states that the geo_data
parameter accepts GeoJSON geometries as a string to create the map, “URL, file path, or data (json, dict, geopandas, etc) to your GeoJSON
geometries” (Folium documentation). No matter how we load the file we must convert the geometry data to function properly with this method. The key_on
parameter of this method binds the data for each specific location (GeoJSON data) with the data for that location (i.e. population).
GeoJSON
GeoJSON files store geometric shapes, in this case the boundaries of a location, and its associated attributes. For instance, the code to load the GeoJSON file with the boundaries of NYC ZIP Codes (referenced above) is as follows:
# Code to open a .geojson file and store its contents in a variablewith open ('nyczipcodetabulationareas.geojson', 'r') as jsonFile:
nycmapdata = json.load(jsonFile)
The variable nycmapdata
contains a dictionary with at least two keys, where one of the keys is called features
, this key is holding a list of dictionaries where each dictionary represents a location. The excerpt of the main GeoJSON structure with the first location is below:
{'type': 'FeatureCollection',
'features': [{'type': 'Feature',
'properties': {'OBJECTID': 1,
'postalCode': '11372',
'PO_NAME': 'Jackson Heights',
'STATE': 'NY',
'borough': 'Queens',
'ST_FIPS': '36',
'CTY_FIPS': '081',
'BLDGpostal': 0,
'@id': 'http://nyc.pediacities.com/Resource/PostalCode/11372',
'longitude': -73.883573184,
'latitude': 40.751662187},
'geometry': {'type': 'Polygon',
'coordinates': [[[-73.86942457284177, 40.74915687096788],
[-73.89143129977276, 40.74684466041932],
[-73.89507143240859, 40.746465470812154],
[-73.8961873786782, 40.74850942518088],
[-73.8958395418514, 40.74854687570604],
[-73.89525242774397, 40.748306609450246],
[-73.89654041085562, 40.75054199814359],
[-73.89579868613829, 40.75061972133262],
[-73.89652230661434, 40.75438879610903],
[-73.88164812188481, 40.75595161704187],
[-73.87221855882478, 40.75694324806748],
[-73.87167992356792, 40.75398717439604],
[-73.8720704651389, 40.753862007052064],
[-73.86942457284177, 40.74915687096788]]]}}, ... ]}
The key_on
parameter of the folium.Choropleth()
method requires users to reference the unique index key in the location dictionaries within the GeoJSON file as a string:
key_on (string, default None) — Variable in the geo_data GeoJSON file to bind the data to. Must start with ‘feature’ and be in JavaScript objection notation. Ex: ‘feature.id’ or ‘feature.properties.statename’.
In the above case the index key is the ZIP Code, the data that associates with each location must also have a ZIP Code index key or column. The key_on
parameter for the above example would be the following string:
‘feature.properties.postalCode’
Note: The first portion of the string must always be the singular word feature
, it is not plural like the parent dictionary holding the list of each individual location dictionary.
The key_on
parameter is accessing the properties
key of each specific location. The properties
key itself is holding a dictionary with eleven keys, in this case the postalCode
key is the index value that will link the geometric shape to whatever value we wish to plot.
GeoPandas
Another way to load geographic data is to use Python’s GeoPandas library (link). This library is useful when loading shapefiles, which are provided on the U.S. Census’ website (Cartographic Boundary Files — Shapefile). GeoPandas works similarly to Pandas, only it can store and perform functions on geometric data. For instance, the code to load the shapefile with the boundaries of all U.S. states is as follows:
# Using GeoPandasimport geopandas as gpd
usmap_gdf = gpd.read_file('cb_2018_us_state_500k/cb_2018_us_state_500k.shp')
If you were to call the first row’s (Mississippi) geometry column in Jupyter Notebook you would see the following:
usmap_gdf[“geometry”].iloc[0]
Unlike the contents of the GeoJSON dictionary, there is no features
key with inner dictionaries to access and there is no properties
column. The key_on
parameter of the folium.Choropleth()
method still requires the first portion of the string to be feature
, however instead of referencing a GeoJSON’s location dictionaries this method will be referencing columns in a GeoPandas dataframe. In this case the key_on
parameter will equal “feature.properties.GEOID”
, where GEOID
is the column that contains the unique state codes that will bind our data to the geographic boundary. The GEOID
column has leading zeros, the California GEOID
is 06
. You may also use the STATEFP
column as an index, make sure you are consistent with both the columns used, formats, and data types.
Reviewing Population Data For A Choropleth
Geographic data and the associated data to plot can be stored as two separate variables or all together. It is important to keep track of the data types of the columns and to make sure the index (key_on
) column is the same for the geographic data and the associated data for the location.
I accessed the U.S. Census API’s American Community Survey (link) and Population Estimates and Projections (link) tables to obtain population and demographic data from 2019 to 2021. The head of the dataframe is as follows:
I saved the data as a .csv
file, in some cases this will change the datatypes of the columns; for instance strings could become numerical values. The datatypes when .info()
is called are as follows:
Another important thing to note is that all leading zeros in the state
column do not appear after loading the data frame. This will have to be corrected; the id must match and be the same data type (i.e. it cannot be an integer in one data frame and a string in another).
Basic Choropleth Maps Five Different Ways
As discussed above, Folium allows you to create maps using geographic datatypes, including GeoJSON and GeoPandas. These datatypes need to be formatted for use with the Folium library and it isn’t always intuitive (to me, at least) why certain errors occur. The following examples describe how to prepare both the geographic data (in this case U.S. state boundaries) and associated plotting data (the population of the states) for use with the folium.Choropleth()
method.
Method 1: With Pandas and GeoJSON, without Specifying an ID Column
This method most closely resembles the documentation’s example for choropleth maps. The method uses a GeoJSON file which contains the state boundaries data and a Pandas dataframe to create the map.
As I started with a GeoPandas file I will need to convert it to a GeoJSON file using GeoPandas’ to_json()
method. As a reminder the usmap_gdf GeoPandas dataframe looks like:
I then apply the .to_json()
method and specify that we are dropping the id
from the dataframe, if it exists:
usmap_json_no_id = usmap_gdf.to_json(drop_id=True)
Note: usmap_json_no_id
is the variable holding the json string in this scenario
This method returns a string, I formatted it so it would be easier to read and show up to the first set of coordinates below:
'{"type": "FeatureCollection",
"features": [{"type": "Feature",
"properties": {"AFFGEOID": "0400000US28",
"ALAND": 121533519481,
"AWATER": 3926919758,
"GEOID": 28,
"LSAD": "00",
"NAME": "Mississippi",
"STATEFP": "28",
"STATENS": "01779790",
"STUSPS": "MS"},
"geometry": {"type": "MultiPolygon",
"coordinates": [[[[-88.502966, 30.215235]'
Note: The “properties” dictionary has no key called “id”
Now we are ready to connect the newly created JSON variable with the US Census dataframe obtained in a previous section, the head of which is below:
Using folium’s Choropleth()
method, we create the map object:
The geo_data
parameter is set to the newly created usmap_json_no_id
variable and the data
parameter is set to the all_states_census_df dataframe. As no id was specified when creating the GeoJSON variable the key_on
parameter must reference a specific key from the geodata, and that it works like a dictionary (‘GEOID’ is a value of the ‘properties’ key). In this case the GEOID
key holds the state code which connects the state geometric boundary data to the corresponding US Census data in the all_states_census_df dataframe. The choropleth is below:
Method 2: With Pandas and GeoJSON, and Specifying an ID Column
This process is almost exactly the same as above except an index will be used prior to calling the .to_json()
method.
Theusmap_gdf
dataframe did not have an index in the above example, to correct this I will set the index to the GEOID
column and then immediately call the .to_json()
method:
usmap_json_with_id = usmap_gdf.set_index(keys = “GEOID”).to_json()
The resulting string, up until the first pair of coordinates for the first state’s data, is below:
'{"type": "FeatureCollection",
"features": [{"id": "28",
"type": "Feature",
"properties": {"AFFGEOID": "0400000US28",
"ALAND": 121533519481,
"AWATER": 3926919758,
"LSAD": "00",
"NAME": "Mississippi",
"STATEFP": "28",
"STATENS": "01779790",
"STUSPS": "MS"},
"geometry": {"type": "MultiPolygon",
"coordinates": [[[[-88.502966, 30.215235],'
The “properties” dictionary no longer has the GEOID
key because it is now stored as a new key called id
in the outer dictionary. You should also note that the id
value is now a string instead of an integer. As mentioned previously, you will have to make sure that the data types of the connecting data are consistent. This can become tedious if leading and trailing zeroes are involved. To fix this issue I create a new column called state_str
from the state
column in the all_states_census_df
:
all_states_census_df[“state_str”]=all_states_census_df[“state”].astype(“str”)
Now we can create the choropleth:
The difference between this code and the code used previously is that the key_on
parameter references id
and not properties.GEOID
. The resulting map is exactly the same as in method 1:
Method 3: With Pandas and GeoPandas’ Python Feature Collection
This method creates a GeoJSON like object (python feature collection) from the the original GeoPandas dataframe with the __geo_interface__
property.
I set the index of the usmap_gdf
dataframe (US geographic data) to the STATEFP
column, which stores the state ids, with leading zeroes, as a string:
usmap_gdf.set_index(“STATEFP”, inplace = True)
I then created a matching column in the all_states_census_df
dataframe (US Census data) by adding one leading zero:
all_states_census_df[“state_str”] = all_states_census_df[“state”].astype(“str”).apply(lambda x: x.zfill(2))
Finally, I used the __geo_interface__
property of the us_data_gdf
GeoPandas dataframe to get a python feature collection of geometric state boundaries, stored as a dictionary, similar to the ones from the first two methods:
us_geo_json = gpd.GeoSeries(data = usmap_gdf[“geometry”]).__geo_interface__
An excerpt of the us_geo_json
variable is below:
{'type': 'FeatureCollection',
'features': [{'id': '28',
'type': 'Feature',
'properties': {},
'geometry': {'type': 'MultiPolygon',
'coordinates': [(((-88.502966, 30.215235), ...))]
Finally, we create the choropleth:
The map looks the same as the ones from above, so I excluded it.
Method 4: With Geopandas’ Geometry Type Column
Here we stick to GeoPandas. I created a GeoPandas dataframe called us_data_gdf
which combines the geometric data and the census data in one variable:
us_data_gdf = pd.merge(left = usmap_gdf,
right = all_states_census_df,
how = "left",
left_on = ["GEOID", "NAME"],
right_on = ["state", "NAME"]
)
Note: all_states_census_df is a pandas dataframe of US Census data and usmap_gdf is a GeoPandas dataframe storing state geometric boundary data.
The code to create a choropleth with a GeoPandas dataframe is below:
In the above example the geo_data
parameter and the data
parameter both reference the same GeoPandas dataframe as the information is stored in one place. As I did not set an index the key_on
parameter equals “feature.properties.GEOID”
. Even with GeoPandas folium requires the key_on
parameter to act as if it is referencing a dictionary like object.
As before, the map looks the same as the ones from above, so I excluded it.
Method 5: With Geopandas Geometry Type and Branca
Here we create a more stylish map using the Branca library and folium’s examples with it. The first step with Branca, aside from installing it, is to create a ColorMap
object:
colormap = branca.colormap.LinearColormap(
vmin=us_data_gdf["Total_Pop_2021"].quantile(0.0),
vmax=us_data_gdf["Total_Pop_2021"].quantile(1),
colors=["red", "orange", "lightblue", "green", "darkgreen"],
caption="Total Population By State",
)
In the above code we access the branca.colormap.LinearColormap
class. Here we can set the colors we use and what values to use for the color scale. For this choropleth I want the colors to scale proportionally to the lowest and highest population values in the US Census data. To set these values I use the vmin
and vmax
parameters as above. If I neglect to do this then the areas with no values will be considered in the color scale, the results without these set parameters are below:
Once the ColorMap
object is created we can create a choropleth (the full code is below):
I adapted the examples on folium’s site to use the us_data_gdf
GeoPandas dataframe. The example allows us to exclude portions (appear transparent) of the geographic data which do not have associated census data (if the population for a state was null then the color on the choropleth would be black unless it was excluded). The resulting choropleth is below:
Branca is customizable but the explanations of how to use it are few and far between. The ReadMe for its repository states:
There’s no documentation, but you can browse the examples gallery.
You have to practice using it to make the kind of map you want.
Summary
Folium can be used to make informative maps, like choropleths, for those with and without coding knowledge. Government websites often have the geographic data necessary to create location boundaries for your data which can also be obtained from government sites. It is important to understand your datatypes and filetypes as this can lead to unnecessary frustration. These maps are highly customizable, for instance you can add tooltips to annotate your map. It takes practice to make use of this library’s full potential.
My repository for this article can be found here. Happy coding.