The world’s leading publication for data science, AI, and ML professionals.

4 use-cases for Sankey Charts

From understanding flow to a quick trick to replace machine learning

Photo by Solen Feyissa on Unsplash + Image by Author
Photo by Solen Feyissa on Unsplash + Image by Author

Sankey charts have become one of the important visualisation techniques in recent time for advanced analytics. It has both characteristics of any awesome visualisation – 1. It can look visually stunning 2. It gives very useful insights

However visualisation makes sense only if it is used in a certain context and purpose. For example, using bar chart to show sales trend is not as effective as using trend charts. Similarly a scatter plot does not make sense if data does not have enough variance

So here I would like to state the use-cases where Sankey charts makes sense.

Analysing flow

When Sankey diagram originated in 1898, its main purpose was to show a flow. The chart itself is named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 in a classic figure shown below which shows the energy efficiency of a steam engine

Source - Wikipedia
Source – Wikipedia

"Flow" could be anything – flow of funds, or supply chain flow. For example Sankey below shows flow of food right from primary availability to consumption. And the main idea is to indicate the loss at each stage of the flow

Source - Wikipedia
Source – Wikipedia

Analysing Time-based patterns

The next use of sankey is to understand time based patterns. And this is useful in many situations. One such examples is to understand customer journey through time. Let me illustrate this with an example. Let us look at a dataset which has customer transactions related to car rental booking. This data contains invoice details as well as information related to vehicle such as vehicle category and vehicle model

Image by Author
Image by Author

Now you can think customer journey as a chronological set of events. In this case it is what different car models customer is renting. They represent the chronological events in a flow style visualisation. Shown below is paths taken by customers. For example customers start with Sedan and then go towards Automatic cars

Image by Author
Image by Author

So as you can see that sankey chart helps a lot in the customer journey. And this kind of analytics can be useful to in different ways – for example anticipating which path customer might take and then making product recommendations

Analysing Hierarchy type data

You can think hierarchy also as a flow. So if you have hierarchy type of data, then Sankey charts are very useful. Let me illustrate it with a dataset on African mobile dataset. The dataset data looks like this (sample data is shown here). The data is for country, city, region, segment, sales and profit

Image by Author
Image by Author

As you can see that the african mobile sales dataset **** contains hierarchy type of data such as Country – City – Region – Segment. We can use sankey chart to visualise such data. Sankey charts also require a numeric value in order to give "width of the flow" between the hierarchy points

The sankey chart on the african sales data is shown below. Here we analyse Western region for Country, City and Segment. The width of the flow is based on Sales.

You will make some interesting observations. For example Nigeria has most number of cities contributing to sales. Also that Mauritania is country with only one city, but it contributes to sales in all segments

Image by Author
Image by Author

Quick trick to replace Machine-learning

Now this might be something unexpected. But let me explain. As you might already know that Machine Learning is used to learn patterns in your data. Generally machine learning is used to understand pattern between multiple input columns with one output column. Let us take as an example a dataset on bank direct marketing . This dataset contains columns on different customers attributes and result of direct marketing campaign. You have columns like age, job, education etc. which represent attributes of customer to whom the marketing campaign was sent. And then you have column response, which indicates wether the customer responded or not

Image by Author
Image by Author

Now if you wanted to understand which customers are likely to respond or not to direct marketing, you would use machine learning. The columns such as age, job etc.. are called features and response is called the target variable. The machine learning algorithm would automatically understand the relation between the features and the response variable.

Now what if I told you that you can achieve the same result with Sankey? Surprised? Well here is the proof below. You can create a sankey with all different columns and keep the response column as last column. The size of the flow is based columns count_1 (which is always 1)

Image by Author
Image by Author

Using this visualisation you can have very valuable insights. For example you can find out which are the characteristics of customers who respond to marketing campaigns. You can trace back the path from response = yes to previous levels. You will see that customers who respond are mostly single, basic.4y education, blue-color. So as you can see that sankey visualisation is very powerful to get very useful insights

Now before you all get excited, please remember that I mentioned this as a "Quick Trick". It cannot use this to replace machine learning completely, as machine learning offers more than just relation between input and output. Anyway , you can use Sankey to verify results of any machine learning model

So that was a quick tour of different scenarios in which Sankey charts can be useful – which perhaps Captain Matthew Henry Phineas Riall Sankey might not have realised the beast he has created when he introduced Sankey charts to the world

Additional resources

Website

You can visit my website to make analytics with zero coding. https://experiencedatascience.com

Please subscribe to stay informed whenever I release a new story.

Get an email whenever Pranay Dave publishes.

You can also join Medium with my referral link.

Join Medium with my referral link – Pranay Dave

Youtube channel Here is link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated


Related Articles