Data Visualization in Python

Create beautiful and interactive Chord Diagrams using Python

A simple guide to creating awesome looking Chord Diagrams, using a single function call.

Sashank Kakaraparty
Towards Data Science
6 min readAug 7, 2020

--

R vs Python is a constant tussle when it comes to what is the best language, according to data scientists. Though each language has it’s strengths, R, in my opinion has one cutting-edge trick that is hard to beat — R has fantastic tools to communicate results through visualization.

This particular point stood out to me this week, when I was trying to find an appealing way to visualize the correlation between features in my data. I stumbled upon CHORD Diagrams!(Which we will get to, in a minute) I had seen a few R examples to generate Chord Diagrams using Circlize where you could just pass the properly shaped data to the chordDiagram() function and ta-da!

You should have seen the look on my face when I found the Python Plotly implementation of the Chord Diagram. Even to get a basic figure, one had to put in a lot of effort. The end result simply did not seem worth the effort. I was almost dropping the idea of using a Chord Diagram, when I stumbled upon chord on pypi.

Okay, What is a Chord Diagram?

A Chord Diagram represents the flows between a set of distinct items. These items known as nodes are displayed all around a circle and the flows are shown as connections between the nodes, shown as arcs.

If that did not explain it clearly, let’s take a look at an example:

Image by the Author

The above Chord Diagram, visualizes the number of times two entities(Cities in this case) occur together in the itinerary of a traveler, it allows us to study the flow between them.

How to create a beautiful Chord Diagram with minimum effort?

Let me take you through the process of data preparation and then the creation of the Chord Diagram.

Installation:

Assuming Pandas is already installed, You need to install the chord package from pypi, using —

pip install chord

Data Preparation:

I am using the Boston House Prices Dataset, which can be downloaded from here.

# importing Pandas libary
import pandas as pd
# reading data from csv
df = pd.read_csv("housing.csv")

My goal, here is to visualize the correlation between the feature in the dataset. So, for the sake of brevity, I will drop a few of the columns. I will be left with only 6 features. (You can skip this if you wish)

# List of columns to delete and then dropping them.
delete = ['ZN', 'INDUS', 'CHAS', 'DIS','RAD','PTRATIO','B','LSTAT']
df.drop(delete, axis=1, inplace=True)

Now let’s create the correlation matrix using Pandas corr() function.

# Now, matrix contains a 6x6 matrix of the values.
matrix = df.corr()
# Replacing negative values with 0’s, as features can be negatively correlated.
matrix[matrix < 0] = 0
# Multiplying all values by 100 for clarity, since correlation values lie b/w 0 and 1.
matrix = matrix.multiply(100).astype(int)
# Converting the DataFrame to a 2D List, as it is the required input format.
matrix = matrix.values.tolist()

This data is now perfect for our plotting!

Plotting the Chart Diagram:

The only step left before plotting, is storing the names of the entities as a list. In my case, these are the names of the features.

# Names of the features.
names = ["Crime Rate","N-Oxide","Number of rooms","Older buildings","Property Tax","Median Price"]

Now, all we have to do is import the package —

from chord import Chord

Then pass the matrix and the names to the Chord() function.

Chord(matrix, names).show()#Note: The show() function works only with Jupyter Labs.
# (Not Jupyter notebook)

This will be your output:

Output in Jupyter Lab. Image by Author.

Before we go further and explore the other style and output settings available in the Chord library, let’s take a look at what the output represents.

As you can see, when you hover on the Crime rate, you can see that it is connected to Property Tax, Older Buildings and level of N-Oxide, but has no connections with the Median Price or the Number of Rooms. You can now hover on the connection and you will see the correlation value between these features.

You might notice that the Median Price is 100% correlated with itself, which is the case with all the features. That happens because we get a perfect correlation value when we compare a feature against itself. We can fix this with a single line of code, if you wish.

# Operate on the data before converting it into a 2D List# We are just converting all Perfect correlation 100's(Basically the 1’s) to 0 as well.
matrix[matrix == 100] = 0
matrix = matrix.values.tolist()

Here is your output, a much cleaner Chord Diagram:

Image by Author

Export the Chord Diagram as HTML:

Since the package uses d3-chord at it’s core, it also gives us the option to output the ChordDiagram as a completely editable HTML file! How cool is that?

Again, a single method call will do it for you —

Chord(matrix, names).to_html()# This will create a file 'out.html' in your current directory.

You can open the HTML in a browser to find the same interactive Chord Diagram or you can open the .html in a code editor and customize the rest of your page!

Here’s my output,

Output in the form of HTML. Image by Author.

What I have done is extremely basic. The point is, output as a HTML opens up a myriad of possibilities to use the Chord Diagram.

Styling and Customization:

Colors:

You can change the colors of the Chord Diagram by passing any colors from the d3 categorical palette. You can find samples of the outputs on the Official Guide. But here are a couple of examples:

# Just add the colors parameter and pass the value.
Chord(matrix, names, colors="d3.schemeDark2").show()
Image by Author
# Just add the colors parameter and pass the value.
Chord(matrix, names, colors="d3.schemeAccent").show()
Image by Author
# Add all the colors to a list.
coloursList = ["#f50057", "#2196f3", "#00e676", "#ff5722", "#00000", "#ff9100"]
# Pass the list to the colors parameter.
Chord(matrix, names, colors=coloursList).show()
Image by Author

Other customization's:

You can customize the labels and the opacity as well, checkout the official guide for that.

Conclusions:

Creating visualizations is almost always a part of a Data Scientist’s work. Part is the keyword here, because it means you cannot spend a lot of time to get them in shape and that is why we look for options that provide a simple yet functional implementation. That’s what I try to explore in this article, by creating an effective Chord Diagram with minimal effort.

This is my first work of technical writing and I have attempted to embed all the best practice that I have come across in my years of reading excellent content from this community. I’d appreciate feedback about any aspects of my work.

Additional resources:

[1] Official Guide — Shahin Rostami’s blog(Author of the library)

[2] chord on PyPi — You can download the package here.

--

--

Updated my bio 27386 times, so this is v1.27386 right? Python | Deep Learning & Computer Vision | I enjoy writing, speaking and coding. Feedback is appreciated.