Have you ever been reading a news article where, halfway down the page, you come across a beautiful graphical representation of whatever the article is discussing? It’s perfect- it’s a geographical map, and the darkness of shading in certain regions corresponds to the presence or strength of a particular statistical trend in that area. You think to yourself, "Wow, that’s neat. I truly understand why they say a picture is worth a thousand words. Because this article just spent five minutes of my time telling me about something I could’ve gleaned from this graphic alone. Powerful. I bet you have to be some sort of math wiz or computer genius to figure out how to make one of those."
I’m happy to tell you a couple things about this moment. First, I totally agree with you; these graphs are neat. They’re called chloropleth graphs, and they can pack such an informational punch. You probably see these more often than you realize (we’re in the thick of election season here in the United States, so I feel like I’m bombarded with them multiple times per week). And second, you do not need to be a master of computers to make one; they’re in fact pretty easy to construct with some basic Python skills. For the rest of the blog, I’ll be showing you how to make one such graph so you can impress all your friends. This guidance will assume you have a base understanding of Python and the Pandas package (although if you don’t, really all you need to know how to do is open a Jupyter notebook from your computer and navigate to file folders with your computer terminal; you could theoretically just copy and paste the rest).
Step One: Install Geopandas
So the first thing you’ll need to do is install a new python package called Geopandas. This will basically just allow us to import shape files and graph them using Python. To install Geopandas, just run the following line in your terminal:
conda install geopandas
You’ll see the package download, so great news, you’re already on your way!
Step Two: Get Some Data
Next, we’re going to want to download some data to use. What do you want to make a map of?! Since I live in Brooklyn, New York, and since an omnipresent topic on everyone’s minds here is COVID-19, I’m going to use some recent testing data I found compiled on the following GitHub page: https://github.com/nychealth/coronavirus-data
Specifically, I’m going to use the most recent positive COVID test rates per zip code and plot them on a map of New York City, which happens to be divided by zip code.
You can really use any data you want, but for your own tinkering, you’ll at least need a .csv file and a .shp file in your data. The first is the spreadsheet-like list of actual data you want to use, and the second is the shape file that will actually plot out our map. You’ll want to make sure that your .csv file .shp use the same unique keys to categorize regional zones (in my case, these unique regional zones are zip codes). When you have these files downloaded to your computer, open up your terminal, navigate to the corresponding folder with those items, and run the following code:
jupyter notebook
You should see something similar to the picture below. And just as a sanity check, your .csv and .shp files should be listed on the left side of your screen:

From this page, create a new Python notebook by clicking the "New" button in the top right, and we’re finally ready to start constructing our graph.
Step 3: Import Relevant Packages and Files in Python
In our notebook, we’ll want to first import relevant Python packages and our downloaded data. Use the following code for this:
# import relevant packages
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
# import relevant data
nyc_map = gpd.read_file("Geography-resources/MODZCTA_2010.shp")
stats=pd.read_csv("recent/recent-4-week-by-modzcta.csv")
If you’re following along with your own data, just make sure to swap out the text in quotes in the last two lines with your actual file paths/names. For reference, assign the data contained in my .csv file to a variable named "stats" and the data in my .shp file to a variable named "nyc_map". If you were to view either one of the files we just imported, they’d look something like this:


There are two things I want us to notice here. First, give the "geometry" column a glance in my .shp file. You’ll see it sort of just looks like a list of coordinates; that’s almost exactly what it is. Stored in this column is all of the geometric data Geopandas needs in order to draw out a shape (or in our case, multiple shapes). Second, check out the "MODZCTA" column, and notice that this column is present in both of my files. These are the unique regional keys I mentioned earlier, which are important because they allow us marry the statistical data in my .csv file to the geometric data in my .shp file.
Step 4: Prep Data for Plotting
So how do we perform this data marriage? We’re going to create a new dataframe with both our geographical and statistical data in it by merging our two tables together. We perform this with the following code:
map_and_stats=nyc_map.merge(stats, on="MODZCTA")
Here, I’m taking my "nyc_map" data (the .shp file) and, using the zip codes in the "MODZCTA" column, attaching all the data from my "stats" (the .csv file). I name this new table "map_and_stats" (imaginative, I know). We can take a look at the new data frame to verify:

Check that out- we’ve got all our info in one place now. We’re ready to plot.
Step 5: Plotting the Data
Alright, the moment we’ve been waiting for, the actual plotting of our data. First, we’re going to establish our plot and size it (I use 8"x8" sizing). I’m also going to rotate our x-axis labels 90 degrees, just to make things a little more readable. We do that with the following code:
fig, ax = plt.subplots(1, figsize=(8, 8))
plt.xticks(rotation=90)
Next, we want to specifically graph data from the joined, new dataframe we made in the last step. Specifically, I want to look at a column in my data called "Latest Rates". There are other attributes you can adjust here (line thickness, shading color, etc.), but I’ll leave that to you guys to play around with. The code for this graphing is below:
map_and_stats.plot(column="Latest Rates", cmap="Reds", linewidth=0.4, ax=ax, edgecolor=".4")
When we execute this, we get the following graph:

Pretty cool, yeah? We can do a couple cosmetic things as well. Titles, axis labels, etc. can be added as you would any other subplot, but let’s add a color scale bar in here, just so we can tell people what all these shades of red actually mean. We can achieve that with the following code:
bar_info = plt.cm.ScalarMappable(cmap="Reds", norm=plt.Normalize(vmin=0, vmax=120))
bar_info._A = []
cbar = fig.colorbar(bar_info)
You’ll want to note here that the bounds of your color scale are set via the "vmin" and "vmax" attributes seen above. The greatest positive testing rate in my data was around 100, so I decided to make my max 120. In any case, if you plot the graph now, you can see the following:

Wow, look how professional we look. So we have this graph, but let’s say I’m interested in looking at a particular portion of it. Again, I live in Brooklyn, so maybe I want to zoom in on that area. We can achieve that by setting x and y limits on our graphs. You can see on the axes, we have coordinate-like numbers that Geopandas used to draw out our map. We use these same coordinates to effectively zoom into areas of interest. The area I want to look at is between 970000 and 1010000 on the x-axis and 140000 and 200000 on the y-axis, so I’ll limit our image to those ranges (and then turn our axes off for cosmetic reasons) with the following code:
ax.set_xlim(970000, 1010000)
ax.set_ylim(140000, 200000)
ax.axis("off")
When we execute, we get the following graph:

Dang, pretty neat, right? Now all you have to do is save your image, print it out, and brag to all your friends and family about how professional you are.