Choropleth Maps with Pandas and Flourish

A tutorial that makes a map of US polio vaccine rates using Python, pandas and Flourish

Chuck Connell
Towards Data Science

--

Image by Author, using Flourish.studio

Background

In a previous article, I covered briefly how to make a choropleth map of U.S. county data related to wastewater tracking. The methods used there can be applied to other data, of course, and there are many mapping options that were not shown in those instructions.

This tutorial contains more detail about making maps with Flourish — using example data for US polio vaccination rates by state and creating the map shown above.

First, a couple definitions…

  • A choropleth map is a geographic map (such as counties or states) that shows data values by color-coding the geographic regions.
  • Flourish is shorthand for Flourish.studio, a London-based software company that provides no-programming data visualizations. A geographic map is one kind of data visualization.

The standard US polio vaccination schedule for children is four total doses, three of which are administered before a child’s third birthday. For this tutorial, we make the simplifying assumption that a child who has all three jabs by their third birthday to be “vaccinated” and a child who does not as “unvaccinated”.

Prepare the Data

Data about US early-childhood vaccination rates for various diseases and age groups is contained in a single dataset from CDC. Simple Python/pandas data prep translates the raw dataset into a map input file.

import pandas as pd 
from sodapy import Socrata
POLIO_MAP_DATA = "polio_coverage_map.tsv"
VAX_DATASET = "fhky-rtsk" # CDC code for this dataset
# Get the data, which covers many vaccines and age groups.cdc_client = Socrata("data.cdc.gov", None)
VaxDF = pd.DataFrame.from_records(cdc_client.get(VAX_DATASET, limit=1000000))
# Select rows about 3 doses of polio vax.PolioDF = VaxDF.query("vaccine == 'Polio'")
PolioDF = PolioDF.query("dose == '≥3 Doses'")
# Select rows for the 50 US states.PolioDF = PolioDF.query("geography_type == 'States/Local Areas'")
PolioDF = PolioDF.query("geography != 'Puerto Rico'")
# Use the latest year in the dataset.PolioDF = PolioDF.query("year_season == '2018'") # Get rows for almost-3-year olds.PolioDF = PolioDF.query("dimension_type == 'Age'")
PolioDF = PolioDF.query("dimension == '35 Months'")
# Select the columns we need for mapping.PolioDF = PolioDF[["geography", "coverage_estimate", "population_sample_size"]]# Make the data file that becomes the input to the map.print ("\nWriting map data to " + POLIO_MAP_DATA)
PolioDF.to_csv(POLIO_MAP_DATA, encoding='utf-8', sep='\t', index=False)

The resulting map file looks like this:

Start Flourish

Flourish is free for individual use and has paid options for business and enterprise users. All my work has been with the free version.

Create a Flourish account and click New Visualization. Everything you make with Flourish is a “visualization” including bar charts, line graphs, scatter plots, hierarchical diagrams, and 2D and 3D maps.

Choose and Clean a Map Template

The built-in templates are the best way to start any project, but you need to clean out the example data and settings before moving forward with your own data and design.

Scroll down to Projection Maps / US States and choose it as your starting point. Give your map a name in the upper left.

This example will use regions only, not map points, so remove points data:

  • Go to the Preview pane and set Points Layer = Disabled.
  • Make sure none of the points data is used. Go to Data pane / Points, and remove the A,B,C,D that you see on the far right.
  • Remove all the points data. Click the header for Column A, then SHIFT-click the header for Column D so all columns are selected. Pull down the widget in the Column D header and choose Remove Columns.

Clear regions data that you will replace:

  • Go to Data / Regions and remove any column listed on the far right other than Geometry = A, and Name = B.
  • Remove Columns C, D and E completely.

Go to the Preview pane. You should see an empty map of the United States. As you hover over a state, its outline should highlight and its name should pop up. Your map is now ready to receive the data about polio vaccination.

Import and Merge Data

Go to Data / Regions. Change the Upload option on the right to Upload Data And Merge. Press that button.

It is important that the current data (the map geography in the Flourish template) line up correctly with the incoming data (the polio vaccine file from Python). Flourish is good about guessing which columns should join each other. In this case, the Name column in the template should join the geography column in the incoming data. You can usually leave the other options in their default setting. So the upload/merge operation looks like this:

Image by Author

Go ahead and do the merge, then select Next.

Set Choropleth Values and Colors

The goal of choropleth maps is to show data values (numbers or keywords) as colors. You have many options for how to color a map, so you should experiment with various schemes to see which best expresses the data story you want to tell. There are built-in color schemes and you can define your own. Colors can be binned (a fixed set of colors, each for a value range) or continuous (varying shades based on a numeric field). Any color scheme can easily be reversed, so low/high colors are swapped.

To produce the same polio vaccination map shown at the top of this article:

  • Go to Data / Regions.
  • Set Value = C. (The value determines which variable controls the coloring.)
  • Go to Preview/ Regions / Fill.
  • Set Scale Type = Numeric, Sequential, Linear.
  • Set Palette = viridis, not reversed.

You should see this:

Image by Author, using Flourish.studio

Here is the exact same color scheme with the same data, but with the colors reversed:

Image by Author, using Flourish.studio

Here is the same data using the plasma palette:

Image by Author, using Flourish.studio

How about Scale Type = Numeric, Diverging, Binned; Palette = Red/Blue? Sequential/diverging refer to the color axis. Sequential is generally shades of the same color, whereas diverging is two different colors.

Image by Author, using Flourish.studio

Which map is better? Which more clearly tells the story you want to communicate? These are subjective questions that are your choice for every map you make.

There also are options for how to handle regions with missing values, to add shadow for a quasi-3D effect, and many other possibilities. Beware that you can drive yourself a bit crazy trying to find the best overall result.

Set Region Popups

Color-coding on a choropleth map is helpful to highlight geographic regions that you want the viewer to focus on. But it is also helpful to see more detailed information about a region of interest. For this, you want popups:

  • Go to Data / Regions / Metadata for Popups (at the bottom of the right-hand control).
  • Set the popup to C,D.
  • Go to Preview. As you hover over each region, you should see the vaccination percent and sample size of the data for that region.

Set Header, Footer and Credits

Since most maps need a title and data credits:

  • Go to Preview / Header / Title (and Subtitle). Set as you want.
  • Go to Preview / Footer / Source Name (and URL). Set as needed. I recommend keeping the names short, since a few credits quickly fill up the map footer. You can add multiple sources.
  • Anything you enter to Note appears after the source credits.

Publish the Map

To make your map visible to others:

  • Click Export And Publish / Publish to Share and Embed / Publish.
  • You will be shown the public URL and the code to embed the map.

If you later make changes to the map, you can re-publish at the same URL so readers will immediately see the updates.

For More Information

en.wikipedia.org/wiki/Choropleth_map (general information about choropleth maps)

help.flourish.studio (Flourish help topics)

www.cdc.gov/vaccines/ (more CDC vaccine data)

--

--