Hands-On Tutorials

Making publication-quality figures in Python (Part III): box plot, bar plot, scatter plot, histogram, heatmap, color map

Walking you through how to understand the mechanisms behind these widely-used figure types

Guangyuan(Frank) Li
Towards Data Science
9 min readJan 8, 2021

--

Photo by Myriam Jessier on Unsplash

This is my third tutorial, here’s a list of all my previous posts and the ones I am going to post very soon:

  1. The tutorial I: Fig and Ax object
  2. Tutorial II: Line plot, legend, color
  3. Tutorial III: box plot, bar plot, scatter plot, histogram, heatmap, colormap
  4. Tutorial IV: violin plot, dendrogram
  5. Tutorial V: Plots in Seaborn (cluster heatmap, pair plot, dist plot, etc)

Why I make this tutorial? What’s the reason you need to spend your precious time reading this article? I want to share the most critical thing about learning matplotlib, which is understanding the building block of each type of figure and how you can gain full control of them. With that undersood, you can easily build off whatever figures you like, no matter how complicated it is. I will use very dummy examples to exaggerate each element in a plot, in order to let you feel where each element is and how they are laid out.

The second incentive for me to make this tutorial is that I want to show you how to read the matplotlib documentation. It is not possible to learn matplotlib just by reading my tutorial or any single one, (don’t trust any article titled as learning XXX in 5 minutes), but I want to share with you where you should go and look up the resources when you have problems. How to quickly find the solution for what you are looking for.

If that aligns with your need, then that’s the right article for you.

All the codes are available at https://github.com/frankligy/python_visualization_tutorial

Boxplot

When you make a boxplot, you basically input a sequence of the one-dimension arrays, the distribution of each array will be represented by a box that displays the median value, 25% quantile, 75% quantile, and upper (q3 + 1.5*IQR) and lower bound (q1–1.5*IQR) of your data.

# load packages
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
# prepare some data
np.random.seed(42)
data1 = np.random.randn(100)
data2 = np.random.randn(100)
data3 = np.random.randn(100)

fig,ax = plt.subplots()
bp = ax.boxplot(x=[data1,data2,data3], # sequence of arrays
positions=[1,5,7], # where to put these arrays
patch_artist=True). # allow filling the box with colors
Basic box plot

A box is made up of a box body (the rectangular shape in the middle), whisker (the vertical line), cap (the horizontal caps), median line (orange line), fliers (outlier markers). How to control all of them?

the ax.boxplot() function will return a python dictionary, this dictionary looks like this:

{'whiskers': [<matplotlib.lines.Line2D object at 0x16b113748>, <matplotlib.lines.Line2D object at 0x16b1136a0>, <matplotlib.lines.Line2D object at 0x16b22b978>, <matplotlib.lines.Line2D object at 0x16b22b2b0>, <matplotlib.lines.Line2D object at 0x16145f390>, <matplotlib.lines.Line2D object at 0x16145f7f0>], 'caps': [<matplotlib.lines.Line2D object at 0x16b1bee10>, <matplotlib.lines.Line2D object at 0x16b1c7358>, <matplotlib.lines.Line2D object at 0x1961afd30>, <matplotlib.lines.Line2D object at 0x1961afb00>, <matplotlib.lines.Line2D object at 0x16b4672b0>, <matplotlib.lines.Line2D object at 0x153ea0eb8>],

'boxes': [<matplotlib.patches.PathPatch object at 0x1614793c8>, <matplotlib.patches.PathPatch object at 0x16b3acc18>, <matplotlib.patches.PathPatch object at 0x16b399b00>],
'medians': [<matplotlib.lines.Line2D object at 0x1546fb5f8>, <matplotlib.lines.Line2D object at 0x1960db9b0>, <matplotlib.lines.Line2D object at 0x153ea0518>], 'fliers': [<matplotlib.lines.Line2D object at 0x16b1e3ba8>, <matplotlib.lines.Line2D object at 0x1960f9fd0>, <matplotlib.lines.Line2D object at 0x161476898>],
'means': []}

It exactly stores all the Artist elements you would like to modify. Here’s the critical part, for instance, there are three PathPatchobjects under the key boxes , they correspond to the three box bodies in the above plots. Each box body will be a PathPatch object. (If you didn’t specify patch_artist = True They will be Line2D objects). This is the point I want to make — the right way to learn and understand matplotlib, is to understand what object each element is. Each aesthetic element shown in the plot has an underlying python object pointing to it. And each underlying python object holds its own methods and attributes, that’s the knowledge we need to master to be able to modify the plots at our wills.

I don’t know what PathPatch the object is, neither Line2D . So I look them up via google search and I find the documentation:

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.PathPatch.html

They told me here’s a valid attribute of this object called facecolor, so I click this attribute and find the way of how to set it to a different color.

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch.set_facecolor

Now I know I just need to use set_facecolor the function to adjust the face color of all my box body. Then you basically do the same thing for all the elements in the box plot like below:

for flier in bp['fliers']:    # outliers
flier.set_markersize(9)
flier.set_marker('v')
for box in bp['boxes']: # box bodys
box.set_facecolor('green')
box.set_edgecolor('black')
box.set_linewidth(2)
for whisker in bp['whiskers']: # whisker line
whisker.set_linewidth(5)
for cap in bp['caps']: # cap line
cap.set_color('red')
cap.set_linewidth(10)
for median in bp['medians']: # median line
median.set_linewidth(15)
Modified boxplot

I purposely make the figure that dummy to make each separate element stands out. This is the building block of the box plot, now having that understood, can’t you make whatever boxplot you want, you can!

And make sure to regularly check the documentation, this is the only way to master matplotlib.

Barplot

The same philosophy applies to all other plots I am going to cover here. For barplot, you basically tell them I want to draw a bar with a certain height , I will put the bar on a certain position . Other than that, they are just some aesthetic adjustment.

fig,ax = plt.subplots()
ax.bar(x=[1,4,9], # positions to put the bar to
height=(data1.max(),data2.max(),data3.max()), # height of each bar
width=0.5, # width of the bar
edgecolor='black', # edgecolor of the bar
color=['green','red','orange'], # fill color of the bar
yerr=np.array([[0.1,0.1,0.1],[0.15,0.15,0.15]]), #
ecolor='red',
capsize=5)
Bar plot

Again, the building block of a bar plot is just that simple, then make sure you check out the documentation for bar plot:

https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.axes.Axes.bar.html

Histogram

A histogram is a plot to show the distribution of a single array, it will display how many elements in this array fall into each bin. So you just give them an array, it will draw a histogram for you, that’s it.

fig,ax = plt.subplots()
ax.hist(x=[data1,data2],bins=20,edgecolor='black')
histogram

A little experience to share, I always found the default binnumber of ax.hist() function is not that pretty, setting bin to a large number may yield a better visual effect. Then adding a black edge can help it as well.

Reminder, please refer to their official documentation for more info:

https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.axes.Axes.hist.html

Scatter plot

Scatter plot is widely used, it shows the distribution of dots in a 2D plane or even a 3D plane. Here we only focus on the 2D plot. The idea is, for a series of points, you prepare four vectors of the same length as the array storing all the points:

x x coordinates of all points in the array

y y coordinates of all points in the array

s the size of all points in the array

c the color of all points in the array

Then you can start to plot.

fig,ax = plt.subplots()
ax.scatter(x=[1,2,3],y=[1,2,3],s=[100,200,300],c=['r','g','b'])
A simple but complete Scatter plot

Now, I need to cover the knowledge of the color map, please allow me to digress for a while. Till now we have only dealt with discrete colors, like here, we have red, blue, and green. Sometimes, we want the color to be in a continuum in the sense that adjacent colors are very similar, and we hope to utilize this similarity of colors to convey some important information, i.e. the similarity between these two points.

We need continuous colors, aka, a color map. Matplotlib carries a lot of color maps, please check them out here:

https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

What you really need to do, is simply choosing a color map, the default colormap is called viridis , but you can specify it to for example autumn .

Let’s see how it work in another scatter plot example:

fig,ax = plt.subplots()
ax.scatter(x=np.arange(10),
y=np.arange(10)+5,
s=np.arange(10)*10,
c=np.random.rand(10), # make sure c is numeric value
cmap='spring')
Scatter plot using color map

Making sure here c needs to be a series of numeric values, so the function will map these float values for example [0.1,0.2,0.3….] to a range of color map, in this way you assign corresponding colors to each point.

Check scatter plot documentation, don’t forget!

https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.axes.Axes.scatter.html

Color map

In this section, we will delve a bit deeper into the color map. There are two frequently-encountered scenarios:

  1. I want to use a continuous color map to assign colors to my continuous variables, for example, heatmap, scatter plot, etc.
  2. I want to extract a certain number of colors to represent my discrete variable, for example, I have 18 categories of objects, I need to assign colors to them in a sense that their differences can be easily discerned by human eyes.

The first scenario is relatively easy, most of the time what you need to do is just specify a “cmap” string like “viridis” or “autumn”. The second scenario can be a bit involved.

If the number of colors you want is less than 10, you can either use the default 10 colors inrcParams[axes.prop_cycle] :

cycler('color', ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'])

Or you can extract from any qualitative color map:

If you want <12 colors:

import matplotlib.pyplot as plt
from matplotlib.colors import to_hex,to_rgb,to_rgba
a = [to_hex(i) for i in plt.get_cmap('Set3').colors[:12]

This is what we get:

['#8dd3c7',
'#ffffb3',
'#bebada',
'#fb8072',
'#80b1d3',
'#fdb462',
'#b3de69',
'#fccde5',
'#d9d9d9',
'#bc80bd',
'#ccebc5',
'#ffed6f']

If you want < 20 colors, just change “Set3” to “tab20”, “tab20b” or “tab20c” where 20 colors are available to choose.

Heatmap

What is a heatmap? Heatmap is basically mapping a 2D numeric matrix to a color map (we just covered). So the input is a 2D numeric array, that’s it.

fig,ax = plt.subplots()
ax.imshow(np.random.randn(5,5),cmap='Set1')
Basic heatmap

Most often, I need to add a grid line and text onto the basic heatmap, I already taught how to add a grid line in Tutorial I: Fig and Ax object.

fig,ax = plt.subplots()
ax.imshow(np.random.randn(5,5),cmap='Set1')
ax.set_xticks(np.linspace(0.5,3.5,4))
ax.set_yticks(np.linspace(0.5,3.5,4))
ax.tick_params(axis='both',length=0,labelsize=0)
ax.grid(b=True,which='major',axis='both',color='black')
Heatmap with grid lines

Finally, in matplotlib, you can flexibly add text onto figures via ax.text(), you just need to specify the location/coordinates you’d like to write things down, then tell it what you want to write, simple right?

fig,ax = plt.subplots()
ax.imshow(np.random.randn(5,5),cmap='Set1')
ax.set_xticks(np.linspace(0.5,3.5,4))
ax.set_yticks(np.linspace(0.5,3.5,4))
ax.tick_params(axis='both',length=0,labelsize=0)
ax.grid(b=True,which='major',axis='both',color='black')
ax.text(-0.2,3,'hey')
Adding text to the heatmap

Conclusion

As I said, it is not realistic for a tutorial to cover all aspects of matplotlib, I am still learning that as well. But I want to “Teach you to fish” instead of “Give you a fish”. Also, I always believe that we should learn everything from the simplest case, that’s why I try to use very silly examples to cover the basic concepts. But you should be able to use the skills covered here to draw your own figures.

If you like these tutorials, follow me on medium and I will teach you how to make a violin plot and dendrogram in matplotlib, thank you so much for your support. Connect me on my Twitter or LinkedIn, also please ask me questions about which kind of figure you’d like to learn how to draw in a succinct fashion, I will respond!

All the codes are available at:

Continuing Reading

Tutorial IV: Violin plot and dendrogram

Tutorial V: Searborn

--

--