Quick Success Data Science

One thing that characterizes professional graphics is an overarching theme that ties everything together. This includes fonts, symbology, and most of all, the color palette.
Below is an example by the US National Park Service. Notice how the harmonious earth tones collaborate to evoke a sense of the great outdoors. This graphic doesn’t just convey information, it conveys it with style!

Python’s Matplotlib plotting library uses Colormaps to define the color scheme for a visualization. Colormaps are arrays of colors used to map pixel data to actual color values.
While Matplotlib comes with many built-in colors and color schemes, they won’t cover every possible scenario. There will be times when you’ll want to personally tailor your colors to a particular theme or concept.
In this Quick Success Data Science project, we’ll look at how to select custom colors and turn them into colormaps that you can use with Matplotlib, seaborn, pandas, geopandas, and other Python-compatible plotting libraries. We’ll then use these colormaps to plot the location of oak trees in New York City.
Acorns, Acorns, Acorns!
Despite an exceptional drought in Texas this year, we’re up to our armpits in acorns. Besides being plentiful, they’re also especially colorful, with those latte tans and purply browns only nature knows how to make. While admiring one this week, I couldn’t help but wonder what an excellent colormap it would make. Being a Pythonista, I immediately put that thought into action.

If you’re from a part of the world that doesn’t have acorns, they’re the nuts produced by oak trees.
Capturing an Acorn’s Colors
To capture an object’s colors, all you need is a digital photograph. While there‘s a whole science around photographically capturing true colors (you can find a few tips here), I just took a phone picture of an acorn on a cloudy day.
I then used Image Color Picker to extract colors from the picture. This free application lets you upload an image file and use your cursor to select and sample pixel colors. For this project, we’ll use RGB (Red-Green-Blue) values, which range from 0 to 255.

To construct a colormap of the acorn, I took two sets of measurements. The first set was comprised of four measurements equally spaced from the dark brown tip to the egg-white base. The second set was five measurements taken the same way. The values for each measurement were copied straight out of the app using the "copy" icon in the RGB output box.

I took two sets of measurements to see if it made any difference. As sampled, it didn’t.
The Colormap Code
The following code, written in JupyterLab, uses Matplotlib to create both discrete (categorical) and continuous colormaps. It then tests the colormaps using a heat map display.
Creating Colormaps
Based on the interpolation method, there are two types of colormaps in Matplotlib:
A listed colormap is a list __ of colors. It’s a discrete colormap with a predefined set of colors, and it doesn’t interpolate between colors.
A linear segmented colormap uses interpolation between color anchor points stored in a dictionary. This creates a continuous colormap.
Discrete colormaps are suitable for categorical data represented by a name or symbol. Continuous colormaps smoothly transition from one color to another. They’re typically used to represent a range of values, such as when plotting temperature or precipitation data.
Creating Listed Colormaps
We’ll use the set of four acorn measurements to produce a listed colormap. Here’s the code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
# Normalize the RGB colors (RGB color tuples in comments):
colors = [(42/255, 34/255, 31/255), # (42, 34, 31)
(82/255, 59/255, 53/255), # (82, 59, 53)
(112/255, 69/255, 37/255), # (112, 69, 37)
(187/255, 164/255, 132/255) # (187, 164, 132)
]
# Create a ListedColormap (discrete colors):
custom_cmap_discrete = ListedColormap(colors)
# Display a colorbar with the custom colormap:
fig, ax = plt.subplots(figsize=(6, 1))
plt.imshow([[i for i in range(len(colors))]],
cmap=custom_cmap_discrete,
aspect='auto')
plt.xticks([]), plt.yticks([]); # Turn off tickmarks
# plt.show()
We need only the Matplotlib library for this purpose, but we added NumPy to generate some dummy test data for later.
Matplotlib’s [ListedColormap()](https://matplotlib.org/stable/users/explain/colors/colormap-manipulation.html#listedcolormap)
class maps values between 0 and 1 to colors. These colors are stored in the .colors
attribute.
Because the ListedColormap()
class uses values from 0 to 1, we must divide the RGB values from our acorn by 255 to normalize the output (Python starts counting at 0, so the maximum value for 256 colors is 255). After that, we just pass this class our colors
list to create the custom colormap.
To view the colormap as a color bar, we can plot it as an image using plt.imshow()
. Here’s the result:

Note that you can also create a colormap by providing a list of official Matplotlib color names. For example:
cmap = ListedColormap(["darkorange", "gold", "lawngreen", "lightseagreen"])
For more on this, see the official docs.
Testing the Listed Colormap
To test the listed colormap in a Matplotlib figure, we’ll use a heat map built from a randomized dataset. Here’s the code:
# Create randomized data with NumPy:
data = np.random.rand(10, 10)
# Plot using the custom colormap:
plt.imshow(data, cmap=custom_cmap_discrete)
plt.colorbar();
# plt.show()
And here’s the result:

I was right; acorns do make great colormaps! This plot looks like a beautiful "butcher’s block" cutting board.
Creating a Linear Segmented Colormap
We’ll now use the set of five acorn measurements to produce the linear segmented colormap. Here’s the code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
# Normalize RGB colors (RGB color tuples in comments):
colors = [(42/255, 34/255, 31/255), # (42, 34, 31)
(82/255, 59/255, 53/255), # (82, 59, 53)
(112/255, 69/255, 37/255), # (112, 69, 37)
(167/255, 143/255, 105/255), # (187, 164, 132)
(177/255, 166/255, 150/255) # (177, 166, 150)
]
# Create a list of positions for each color in the colormap:
positions = [0.0, 0.25, 0.5, 0.75, 1.0]
# Create a LinearSegmentedColormap (continuous colors):
custom_cmap = LinearSegmentedColormap.from_list('custom_colormap',
list(zip(positions, colors)))
# Display a colorbar with the custom colormap:
fig, ax = plt.subplots(figsize=(6, 1))
plt.imshow([[i for i in range(256)]],
cmap=custom_cmap,
aspect='auto',
vmin=0,
vmax=255)
plt.xticks([]), plt.yticks([]);
# plt.show()
In this case, we imported Matplotlib’s [LinearSegmentedColormap()](https://matplotlib.org/stable/api/_as_gen/matplotlib.colors.LinearSegmentedColormap.html#matplotlib.colors.LinearSegmentedColormap)
class rather than the [ListedColormap()](https://matplotlib.org/stable/users/explain/colors/colormap-manipulation.html#listedcolormap)
class. This class specifies colormaps using anchor points between which RGB(A) values are interpolated. That is, it generates colormap objects based on lookup tables using linear segments. It creates the lookup table using linear interpolation for each primary color, with the 0–1 domain divided into any number of segments.
Here’s the result:

A key part of this code is the positions
variable. Note that I used evenly spaced segments (such as 0.25 to 0.75 to 1.0) but there’s no reason you couldn’t "stretch" or "compress" an interval. For example, to make the colormap in the title image for this article, I used asymmetrical segments defined by [0.0, 0.25, 0.65, 0.75, 1.0]
.
Testing the Linear Segmented Colormap
To test the linear segmented colormap in a Matplotlib figure, we’ll again use a heat map built from a randomized dataset. Here’s the code:
# Create randomized data:
data = np.random.rand(10, 10)
# Plot using the custom colormap:
plt.imshow(data, cmap=custom_cmap)
plt.colorbar();
# plt.show()
And here’s the result:

If you compare this heat map to the one generated with the listed colormap, you’ll see that there is more color variability, as the linear segmented colormap is continuous and permits the use of more than four colors.
Well, that’s most of what you need to know to build custom colormaps with Matplotlib. For a few more details, check out the docs.
Plotting the Oak Trees of New York City
Now, let’s use the continuous colormap with an actual map. To honor the acorn theme, we’ll plot the location of oak trees in New York City.
While there are at least thirteen species of oak identified in the city, we’re going to use a subset of four types: English, Shumard’s, pin, and white. We need to limit the types because our colormap, though attractive, isn’t very practical for resolving a large number of categories.
The Dataset
The tree locations are from the NYC OpenData portal. This portal provides free public data published by New York City agencies and other partners. I’ve filtered the data to the names and latitude-longitude locations for the four oak types and stored it as a CSV file in this Gist.
We’ll load this file with pandas and then use geopandas to project the locations on a map. Geopandas produces a GeoDataFrame, which is like a pandas’ DataFrame with a special "geometry" column that bundles the geometry type (such as "POINT") with plottable coordinates.
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
# Load the CSV file into a pandas DataFrame:
df = pd.read_csv('https://bit.ly/3t3Vbx7')
# Create a GeoDataFrame with Point geometries:
gdf = gpd.GeoDataFrame(df,
geometry=gpd.points_from_xy(df['longitude'],
df['latitude']),
crs='EPSG:4326')
gdf.head(3)

Plotting the Location Map
Next, we’ll plot the points using geopandas’ built-in plotting functionality, which is based on Matplotlib. Geopandas also comes with handy built-in datasets, such as "nybb" for "New York borough boundaries." We’ll plot these municipal boundaries in olivedrab
, to match the overall color theme.
We’ll also set the geographical coordinate reference system (crs) for the tree locations GeoDataFrame (gdf
) to that used for the borough boundaries GeoDataFrame (gdf_nyc
). This ensures that the two datasets are projected consistently.
# Plot tree locations along with NYC borough boundaries:
path_to_data = gpd.datasets.get_path("nybb")
gdf_nyc = gpd.read_file(path_to_data)
# Extract the boundaries GeoSeries:
borough_boundaries = gdf_nyc.boundary
# Plot the boundaries with no fill:
ax = borough_boundaries.plot(figsize=(9, 9),
linewidth=1,
edgecolor='olivedrab')
# Convert the tree gdf crs to the boroughs crs:
gdf = gdf.to_crs(gdf_nyc.crs)
# Plot the tree locations in the same figure:
gdf.plot(column='common',
ax=ax,
legend=True,
markersize=1,
cmap=custom_cmap)
# Customize the plot:
plt.title('NYC Selected Oak Tree Species Distribution')
plt.xticks([]), plt.yticks([]);
# Show the plot
# plt.show()
Here’s the result:

One thing to note here is that, even though we plotted discrete data (the name of the trees), we were able to use a continuous colormap. If you use the discrete colormap, you’ll get slightly different results, as the anchor points for the two colormaps aren’t the same.
Regardless of which colormap you use, you’ll see that pin oaks are the dominant oak type in New York City, at least among the species mapped.
Plotting a KDE Map
As you may have noticed in the previous diagram, the narrow range of our colormap makes it difficult to visually parse more than a few discrete categories. It’s better suited for a continuous distribution, such as produced by a KDE map.
A KDE (Kernel Density Estimate) map is a way to visualize the distribution of points, like oak trees, across a geographical area. It’s based on a statistical technique for estimating the underlying continuous probability distribution of a set of samples.
Because a KDE map provides a smoothed representation of the density of occurrences, it’s perfect for highlighting regions of higher or lower concentration. Typically, higher concentrations are represented by darker or warmer colors.
To make a KDE map, we’ll need to reverse our colormap, so that darker colors represent larger values, and use Geoplot to make the map. Geoplot is a high-level, open-source geospatial plotting library that claims to be the "seaborn of geospatial." This means that it builds on underlying libraries, like GeoPandas, to make mapping easy.
You can install Geoplot with these commands for conda or pip:
conda install -c conda-forge geoplot
pip install geoplot
Here’s the code. Note that we’re building off previous work and not reloading the database.
from matplotlib.colors import LinearSegmentedColormap
import geoplot as gplt
import geoplot.crs as gcrs
# Reverse colormap so darkest = most dense for KDE plot:
colors = [(177/255, 166/255, 150/255), # (177, 166, 150)
(167/255, 143/255, 105/255), # (187, 164, 132)
(112/255, 69/255, 37/255), # (112, 69, 37)
(82/255, 59/255, 53/255), # (82, 59, 53)
(42/255, 34/255, 31/255) # (42, 34, 31)
]
# Create a list of positions for each color in the colormap
positions = [0.0, 0.25, 0.50, 0.75, 1.0]
# Create a LinearSegmentedColormap
custom_cmap_r = LinearSegmentedColormap.from_list('custom_colormap',
list(zip(positions, colors)))
# Get the borough boundaries:
boroughs = gpd.read_file(gplt.datasets.get_path('nyc_boroughs'))
boroughs = boroughs.to_crs('EPSG:4326')
# Reset the gdf's crs:
gdf = gdf.to_crs('EPSG:4326')
# Plot the KDE map:
ax = gplt.kdeplot(gdf, cmap=custom_cmap_r, fill=True, clip=boroughs)
gplt.polyplot(boroughs, zorder=1, ax=ax);
Here’s the result:

Now there’s a map that will make you want a cup of coffee!
Custom Colormaps for the Color Blind
Be mindful that custom colormaps may not be appropriate for the 5 to 10 percent of the population who suffer from color blindness. Unfortunately, our acorn colormap would fall into the inappropriate category, especially when mapping oak tree locations as individual points.
Some strategies for accommodating color blindness include:
- the use of highly contrasting colors,
- the inclusion of patterns, textures, or symbols,
- the use of monochromatic gradients (like our KDE map),
- the use of color-selection tools like Vischeck and Colorbrewer.
To view an example of "colorblind-safe" colors in a variety of plot types, check out the seaborn-colorblind style sheet in the Matplotlib gallery.
Summary
Matching your visualization’s colormap to the theme of your data can make it more engaging for your readers. If the colormaps provided with Matplotlib aren’t sufficient, you can always generate your own.
Applications like Image Color Picker help you extract color codes from images. Matplotlib provides two classes, ListedColormap()
and LinearSegmentColormap()
, that let you easily turn these color codes into colormaps usable in both statistical and geospatial plots.
Thanks!
Thanks for reading and please follow me for more Quick Success Data Science projects in the future.