The world’s leading publication for data science, AI, and ML professionals.

How a simple textual explanation can add value to your data science results

Enhance the power of your data exploration using textual explanations

The popular saying "A picture is worth a thousand words" may be wrong when it comes to data science. Take the example of Uber Estimated Time of Arrival (ETA) algorithm which informs the user when the ride is expected to arrive.

Behind the ETA , there is a lot of complex predictive algorithm and cutting-edge visualisation with the map getting updated in real time. But all this is of no use without the single text line which says "The closest driver is approximately 1 min away."

Uber Estimated Time of Arrival (ETA) algorithm in action
Uber Estimated Time of Arrival (ETA) algorithm in action

A data scientist or data analyst produces lots of data visualisation during a data exploration phase. All the cool visualisations look great, but you can really enhance its value using short textual explanations. Also in many cases, visualisations alone are not sufficient.

Only visualisations without explanations are source of misinterpretation

Take a simple example of a histogram. Shown below is a histogram of a stock price close value.

Just by looking at this visualisation, one can make many interpretations such as:

Interpretation 1 – The maximum occurring value is between 13 and (something…).

Interpretation 2 – The lowest value seems to be between 5 something and 10 something

Stock trading is an area where one has to be very precise in values. So if interpretation is not precise, the visualisation alone does not help.

Data story-telling is a compulsion because visualisations alone do not do the job

Since many years of data, storytelling has become a must-have skill for a data scientist. But actually speaking, it is a compulsion because visualisations alone cannot convey the story.

A very simple visualisation can have a great story behind it. But unless it is told, it never surfaces. Take the histogram visualisation which was shown above.

The real story behind the histogram is that the stock price is swinging between 11 and 15 and stays on 12 for a very short amount of time. So the buying opportunity on 12 is very short. This kind of story is impossible to capture in a visualisation and needs to be physically told. Even if advanced visualisations such as animations are used, it still requires someone physically to tell the story.

So this is where the power of a text explanation comes into play. Adding a short textual explanation enhances the value of visualisation. You go from showing visualisation to convey something meaningful.

Let us now see some examples where explanations enhance interpretation of visualisation:

Explaining a correlation matrix and avoid the stress of a "color-maze"

A correlation matrix visually looks stunning. However due to the presence of a lot of different shades of color, one has to look hard to interpret it. However, just by adding a few lines of textual explanation vastly increases the interpretation of correlation matrix. The text can explain which are the most correlated data, as well as what the different shades of color mean.

Shown below is the correlation matrix based on car data. As you can see that just by adding a small explanation clearly enhances the value of the nice-looking correlation matrix. It will save your users "eye-balling" to see which are the most correlated data.

Example of text explanation of correlation matrix
Example of text explanation of correlation matrix

Explaining a cumulative distribution to avoid "eye-balling" x and y axis

Cumulative distributions are very important to show how a numeric value is distributed. It is also a creative way of focusing on important threshold levels of the numeric column.

However, just showing the cumulative distribution without any explanation is a painful eye-balling exercise. With a short explanation text about different threshold levels immediately gets the power of cumulative distribution to the next level and starts making sense.

Shown below is the cumulative distribution of stock price. With text explanations on thresholds (example 80% of close prices are less than 79.31) clearly enhances the value of a cumulative distribution visualisation.

Example of text explanation of the cumulative distribution
Example of text explanation of the cumulative distribution

Explaining the result of clustering to avoid any guesswork

Clustering is a very powerful tool for any data exploration activity. However, it can be one of the most misinterpreted visualisations if not clearly explained. The result of clustering is generally a scatter plot with clusters shown in different colors. However, the catch here is the fact that a 2D scatter plot visually shows only 2 columns of your data, where the clustering itself resulted from much more columns.

So in order to correctly explain the clustering results, you need to use textual explanation which contains the feature importance of the clustering results

Example of text explanation of clustering
Example of text explanation of clustering

Including text generation functions in your developments

As data scientists, we focus on coding for all activities from data preparation, feature engineering, hyperparameter tuning, modeling, visualisation. But most of us do not focus on automatically generating textual explanations of results. So it is a good idea to make a habit to include functions which generate textual explanations inside the code.

As more and more algorithms are packaged into products meant for end-users, the textual explanations of results is becoming very evident. And will make your Data Science work more appealing to a wider audience.

Additional resources

Website

You can visit my website to make analytics with zero coding. https://experiencedatascience.com

Please subscribe to stay informed whenever I release a new story.

Get an email whenever Pranay Dave publishes.

You can also join Medium with my referral link.

Join Medium with my referral link – Pranay Dave

Youtube channel Here is link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated


Related Articles