Sankey plot is a very nice and neat way to visualize how the values flow from its source category to the target category. In the left panel shown above, I used the Sankey plot to demonstrate how each hematopoietic population (stem cell, erythroid progenitors) can be further divided into more granular sub-populations informed by their mutation profiles. Asides from this example, the Sankey plot has also been widely used to display energy flow changes, statistics about campaigns, and so on.
Although there are a few established packages to draw the Sankey plot in one line (i.e. pySankey, Plotly, networkD3), I found it a very intriguing exercise to figure out what the Sankey plot is actually made up of, so that we can potentially make our own Sankey plot from scratch, with desirable modifications.
If we dissect a complicated Sankey plot into pieces, we will find out the most essential parts are just two vertical bars (rectangles) representing the quantity of each category on the left side and right side, along with the curvy strips that connect these two rectangles. Today, I would like to share two ways to draw the plot of these parts in the Matplotlib package.
The code in this tutorial can be found here.
Setting up the grid and the sidebars
First, we need to draw the rectangles on both sides, this task can be viewed in different ways. One way is to treat them as a region between two lines so that we can use theax.fill_between
function to fill it with the color. Another way is to look at them as separate rectangles and utilize the built-in Rectangle
patch in matplotlib
package. Here let’s demo using the second approach but you can try the first means as well.
from matplotlib.patches import Rectangle
fig,ax = plt.subplots()
ax.set_xlim([0,1])
ax.set_ylim([0,1])
ax.grid()
After specifying the canvas and the ax that we are going to plot, we can start to draw these two rectangles. If you are not familiar with canvas(fig) and ax, maybe my previous tutorial sets can help a bit.
rec_left = Rectangle(xy=(0.1,0.6),width=0.2,height=0.2,facecolor='orange',edgecolor='k')
ax.add_patch(rec_left)
rec_right = Rectangle(xy=(0.7,0.2),width=0.2,height=0.2,facecolor='green',edgecolor='k')
ax.add_patch(rec_right)
Method 1: Using Matplotlib Path Object
Next, the task is to draw the curvy strips, I would like to first introduce the powerful Path
module in Matplotlib. This module sometimes is considered a low-level API, but it actually underlies most commonly-used visualizations in Python, as everything can boil down to scribbling right?
The Path
module allows you to draw any line in any shape at your own will, this is controlled by two arguments, named verts
and codes
by convention. Theverts
is a 2D NumPy array or a nested list in which each element is a tuple, and each element or each row for this 2D array represents the (x,y) coordinates of the anchor points, such that if you’d like to draw a line with 5 anchor points, your vert
should be of length 5. In the meanwhile, the codes
parameters define how each anchor point will be interpreted while drawing the lines. Let’s look at a simple example:
from matplotlib.path import Path
from matplotlib.patches import PathPatch
verts = [(0.3,0.8),(0.5,0.8),(0.7,0.4),(0.3,0.6),(0.3,0.8)]
codes = [Path.MOVETO, Path.LINETO, Path.LINETO, Path.LINETO, Path.CLOSEPOLY]
p = Path(verts,codes)
ax.add_patch(PathPatch(p,fc='none'))
The plot will look at that:
I hope this example can give you an idea of how verts
and codes
are cooperating to complete the task. However, as you can see, so far we are still just dealing with straight lines, what about the curvy strip that we are hoping to draw?
There are two additional codes named CURVE3
and CURVE4
that you can contain in your codes
list. The CURVE3
represents what we call a quadratic bezier curve, whereas the CURVE4
represents the cubic bezier curve. Illustrated in the following figure:
If you’d like to draw a quadratic bezier curve, you need to specify three anchor points (CURVE3) with the middle one to control the curvature. Similarly, you need four points in total to draw a cubic bezier (CURVE4). We going to use CURVE4 for our strip Visualization.
verts = [(0.3,0.8), (0.5,0.8), (0.5,0.4), (0.7,0.4)]
codes = [Path.MOVETO, Path.CURVE4, Path.CURVE4, Path.CURVE4]
p = Path(verts,codes)
ax.add_patch(PathPatch(p,fc='none',alpha=0.6))
Now it becomes clearer how we are going to achieve the final goal, the code is like this below:
verts = [(0.3,0.8), (0.5,0.8), (0.5,0.4), (0.7,0.4), (0.7,0.2), (0.5,0.2), (0.5,0.6), (0.3,0.6), (0.3,0.8)]
codes = [Path.MOVETO, Path.CURVE4, Path.CURVE4, Path.CURVE4, Path.LINETO, Path.CURVE4, Path.CURVE4, Path.CURVE4, Path.CLOSEPOLY]
p = Path(verts,codes)
ax.add_patch(PathPatch(p,fc='red',alpha=0.6))
Method 2: Using Numpy Convolve
Another way to think about the strip is to reframe it as a signal-processing problem, this idea is from the implementation of pySankey packages. When given two signals f(x)
and g(x)
, linear convolution operator can summarize how the signal f(x)
is affected by the signal g(x)
, so that results in the convoluted signal function f(x)*g(x)
. This function has been implemented asnumpy.convolve
and how it is computed is beautifully illustrated in this article. Please note that here we utilize mode='valid'
, so that only positions where two signals completely overlap will be computed.
Following this logic, we can modify our code to:
yu = np.array(50*[0.8] + 50*[0.4])
yu_c = np.convolve(yu, 0.05*np.ones(20),mode='valid')
yu_cc = np.convolve(yu_c, 0.05*np.ones(20),mode='valid')
yd = np.array(50*[0.6] + 50*[0.2])
yd_c = np.convolve(yd, 0.05*np.ones(20),mode='valid')
yd_cc = np.convolve(yd_c, 0.05*np.ones(20),mode='valid')
ax.fill_between(np.linspace(0.3,0.7,62),yd_cc,yu_cc,color='blue',alpha=0.6)
We basically do the same thing shown in the dummy example above twice, the first one is for the curve in the upper half, and another one is for the bottom half, in that manner, we can utilize the convenient ax.fill_between
function to fill the area with desirable colors to complete the tasks.
Conclusion
Well, that’s about it, thanks a lot for reading, and hope you enjoy this short tutorial. If you like this article, follow me on medium, thank you so much for your support. Connect me on my Twitter or LinkedIn, also please let me know if you have any questions or what kind of tutorials you would like to see in the future!