The world’s leading publication for data science, AI, and ML professionals.

The tale of Ultra Modern Visualizations – Sankey chart

Data Science has been gaining momentum over the past couple of years. It's undoubtedly one of the hottest fields in today's time. In this…

Let’s dive into exploring the use case of Sankey Charts in this series of Advanced Visualisation Techniques for Data Science.

Data Science has been gaining momentum over the past couple of years. It’s undoubtedly one of the hottest fields in today’s time. In this article, I am going to discuss about an essential part of Data Science – Data Visualization.

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

Data Visualization is an integral part of the Data Science process which can help you to gain insights and comprehend your data better so that you can make optimum usage of the data while building machine learning models.

In this series, I am going to take you through some immensely useful yet unexplored charts. So let’s start with the:

Sankey Chart

Sankey charts are like flow charts wherein the width of the arrows represent the amount of flow. The nodes are the values from which and to which the data flows, the width of the link represents the amount of data flowing. The source and the target are specified with respect to the node’s indices and the values which represent the amount of the data flowing should be mentioned.

Here, I have incorporated a very basic Sankey Chart which exhibits the flow of customers from Base, Silver and Gold tiers with respect to the first and second year. I have made it using Plotly so that it’s interactive. So feel free to hover over it to get more insights 🙂

Here, the nodes are Year 1 Base, Year 2 Base and the values to and from data is flowing, on hovering over the chart, the source, target and values would be displayed.

You will find my entire source code here. On hovering, you’ll realize that it shows us how many customers have migrated from one tier to another tier.

This chart is thus very useful when we are expected to visualize for huge datasets. I have recreated a chart using reference from Plotly which is a perfect example which reflects the usefulness of the Sankey Chart when we’re handling large data with lots of inter connectivity.The chart below shows the Energy Forecast for 2050. Make sure you hover over the chart to get better insights. The positioning of the nodes can be changed as per the user’s convenience.

P.S. You can play around with the chart and try experimenting with your own values here by clicking on the Edit Chart option 🙂

One additional feature here is that the colors of the nodes and the links can be customized according to preference. This Chart from Google Charts is a perfect example for the same.

Sankey Chart. Source: Google Charts
Sankey Chart. Source: Google Charts

Conclusion:

Sankey Chart is easily one of the best charts that can be used to represent the flow of big data from the Source node to the Target node.

  1. It is especially useful when we are handling big data and we are expected to visualize the flow of data among a lot parameters.
  2. The directed arrows in the Sankey Charts helps one understand the flow of data in the production environment which makes them so useful.
  3. Sankey Charts are majorly used for visualisation for materials or energy flows.
  4. They help to uncover the inconsistencies in the data very easily due to attractive and informative design.

I will strongly recommend everyone to try it once!

References:

Google Charts

Plotly

Source Code for the 1st visualization


Related Articles