The world’s leading publication for data science, AI, and ML professionals.

Probably the Best Data Visualisation for Showing Many-to-Many Proportion In Python

How to draw a fancy chord chart with links using PyCirclize

In my previous article, I have introduced the Python library called PyCirclize. It can help us to generate very nice Circos Charts (or Chord Charts if you like) with very little effort. If you want to know how it can make the Data Visualisation well- "Rounded", please don’t miss out.

Make Your Python Data Visualisation Charts Well-"Rounded"

However, don’t worry if you are only interested in the Chord Charts with Links. This article will make sure you understand how to draw this type of chart.

In this article, I’ll introduce another type of Chord Chart that PyCirclize can do. That is a Chord Chart with links that will visualize proportional relationships between many-to-many entities very well, and so far is the best one among all the known typical diagram types.

Before we start, just make sure to use pip for installing the library as follows. Then, we are all good to go. Let’s explore this fancy chart together!

pip install pycirclize

1. Quick Start

Image by BRRT from Pixabay
Image by BRRT from Pixabay

As usual, let’s start with something abstract but easy to follow. The purpose is to show you what the chart looks like and what’s the basic way of plotting it. Let me put the full code and the diagram at the beginning.

from pycirclize import Circos

sectors = {"A": 100, "B": 200, "C": 150}
sector_colors = {"A": "red", "B": "blue", "C": "green"}
circos = Circos(sectors, space=5)

for sector in circos.sectors:
    track = sector.add_track((95, 100))
    track.axis(fc=sector_colors[sector.name])
    track.text("Sector " + sector.name, color="white", size=12)
    track.xticks_by_interval(10)

circos.link(("A", 0, 20), ("B", 50, 70))
circos.link(("A", 20, 40), ("C", 30, 50))
circos.link(("B", 80, 100), ("A", 40, 60))
circos.link(("C", 100, 120), ("B", 150, 170))

fig = circos.plotfig()

Now, let’s look at the code.

Of course, whenever we use a 3rd party library, we have to import it. For PyCirclize, we need to import its Circos module. We gonna use this class to define the components of this diagram.

from pycirclize import Circos

Then, as we want to show the links and correlations between some entities, we need to define these entities as "sectors". The reason why we call the sectors is because they will be part of the full circle. Here, I also defined another dictionary because in the chart we want to show different colours for these sectors.

sectors = {"A": 100, "B": 200, "C": 150}
sector_colors = {"A": "red", "B": "blue", "C": "green"}

Then, we will create the Circos Diagram object from the Circos class as follows.

circos = Circos(sectors, space=5)

The class factory method simply takes the dictionary we created for the sectors, as well as the space. The space indicates the gap between the sectors. You may notice that the sum of the sector sizes is not 360, that is because the library will do the necessary normalisation for us. We don’t need to worry about that, just simply put the numbers from our original dataset.

Then, let’s loop the sectors to render each sector as a track on the diagram.

for sector in circos.sectors:
    track = sector.add_track((95, 100))
    track.axis(fc=sector_colors[sector.name])
    track.text("Sector " + sector.name, color="white", size=12)
    track.xticks_by_interval(10)

The add_track() will add the sector with normalised length. The tuple (95, 100) can be considered to be the starting and end position of the sector thickness. Thinking the origin of the circle is 0, then the sector starts from 95 and ends up with 100. So, the position is 95 pixels away from the origin and the thickness is 100

Then, the axis() function tells the track to render the sector axis. The colour will be from our pre-defined dictionary. So, this makes sure our 3 sectors have different colours. That is important to make sure we can distinguish the sectors.

Next, we add the text inside the axis. The text is generated based on the names in the colour dictionary.

Then we want to add the ticks for every 10 units.

After the axes are defined, we also need to define the links. In this case, we are defining the links very manually as follows.

circos.link(("A", 0, 20), ("B", 50, 70))
circos.link(("A", 20, 40), ("C", 30, 50))
circos.link(("B", 80, 100), ("A", 40, 60))
circos.link(("C", 100, 120), ("B", 150, 170))

The first tuple defined the origin sector and the starting/ending position. So, for the first line, it will be 0–20 on the axis of sector A. The second tuple defines the destination sector and the positions. In the first line, this will be between 50–70 on the axis of sector B. So do the other links.

Eventually, just like matplotlib has the plt.show() function to show the diagram, PyCirclize also has the following line to make sure the diagram is rendered and displayed.

fig = circos.plotfig()

Above is the method to manually generate a Chord Diagram with links. In practice, we may have our data in a Pandas Dataframe. In the next section, I’ll simulate a real-world example to demonstrate a more practice use case.

2. Real-World Example – Total Insured Values Diagram

Image by PublicDomainPictures from Pixabay
Image by PublicDomainPictures from Pixabay

Now, let’s have a look at a real-world example. Suppose we are data analysts working for an insurance company. We want to generate a fancy chart to show the Total Insured Value for the customers in different cities and the insured proportion of different insurance types.

The table above shows how much millions of dollars the customers in the city insured for certain assets. Now, let’s see how to convert it into a fancy diagram.

Since we want to simulate the scenario that the data is in a Pandas Dataframe, let’s import the Pandas library, too.

from pycirclize import Circos
import pandas as pd

Now, let’s initialise the data and put them into a Dataframe. The data is exactly from the previous table.

# Initialise the data
row_names = ["Sydney", "Melbourne", "Brisbane"]
col_names = ["Property", "Life", "Automobile"]
data = [
    [100, 150, 200],
    [80, 120, 160],
    [60, 90, 130],
]

# Create a pandas dataframe
df = pd.DataFrame(data, index=row_names, columns=col_names)

Now, let’s use the initialize_from_matrix() method from the library to generate the visualisation on the fly!

# Define the Circos Diagram with links
circos = Circos.initialize_from_matrix(
    df,
    space=5,    # Space between sectors
    ticks_interval=50,  # Ticks every 50
    r_lim=(93, 100),# Radius limits for sectors
    cmap="tab10",   # Use a built-in color map to get better looking colour code
    label_kws=dict(r=94, size=12, color="white"),   # Font of the sector labels
    link_kws=dict(ec="black", lw=0.5),  # Style of the links
)

In the above code, we use the df as data, the space between sectors is 5 pixels, and then we add ticks for every 50. The r_lim() does exactly the same thing as the quick start example, that is to decide the position and thickness of the sector titles. Then, we can use a colour map to make sure the sectors and links can be easily distinguished. Finally, define the font and style of the sector label text and the link styles.

After all the configurations, don’t forget to plot it with the following line of code.

fig = circos.plotfig()

Here is the full code snippet for your convenience.

from pycirclize import Circos
import pandas as pd

# Initialise the data
row_names = ["Sydney", "Melbourne", "Brisbane"]
col_names = ["Property", "Life", "Automobile"]
data = [
    [100, 150, 200],
    [80, 120, 160],
    [60, 90, 130],
]

# Create a pandas dataframe
df = pd.DataFrame(data, index=row_names, columns=col_names)

# Define the Circos Diagram with links
circos = Circos.initialize_from_matrix(
    df,
    space=5,    # Space between sectors
    ticks_interval=50,  # Ticks every 50
    r_lim=(93, 100),# Radius limits for sectors
    cmap="tab10",   # Use a built-in color map to get better looking colour code
    label_kws=dict(r=94, size=12, color="white"),   # Font of the sector labels
    link_kws=dict(ec="black", lw=0.5),  # Style of the links
)

fig = circos.plotfig()

That’s great! However, we can do even better. That is, the links are not directional. We have all the insurance types on the left and all the cities on the right. If we can define the links with directions, the diagram will be much more readable.

That’s actually very easy in PyCirclize. We just need to add direction=1 in the link style configuration.

Full code snippet as follows.

from pycirclize import Circos
import pandas as pd

# Initialise the data
row_names = ["Sydney", "Melbourne", "Brisbane"]
col_names = ["Property", "Life", "Automobile"]
data = [
    [100, 150, 200],
    [80, 120, 160],
    [60, 90, 130],
]

# Create a pandas dataframe
df = pd.DataFrame(data, index=row_names, columns=col_names)

# Define the Circos Diagram with links
circos = Circos.initialize_from_matrix(
    df,
    space=5,    # Space between sectors
    ticks_interval=50,  # Ticks every 50
    r_lim=(93, 100),# Radius limits for sectors
    cmap="tab10",   # Use a built-in color map to get better looking colour code
    label_kws=dict(r=94, size=12, color="white"),   # Font of the sector labels
    link_kws=dict(direction=1, ec="black", lw=0.5),  # Style of the links
)

fig = circos.plotfig()

Summary

Image by Steven Liao from Pixabay
Image by Steven Liao from Pixabay

In this article, I have introduced a Data Visualisation Diagram type – Chord Chart with Links. It is one of the best chart types to demonstrate the relationships of proportion and correlation between multiple entities, especially for the "many-to-many" relationships.

For the next one, I’ll keep digging out more amazing out-of-the-box features from the PyCirclize library, to make our Python generated Data Visualisations more well- "rounded".

Unless otherwise noted all images are by the author


Related Articles