The world’s leading publication for data science, AI, and ML professionals.

How to Write a Data Science Post Everyone Can Understand

Six useful guidelines to explain data-related things even for a non-technical person

Photo by Christopher Burns on Unsplash
Photo by Christopher Burns on Unsplash

I often write blog posts on Data Science and Machine Learning. My intention is to write such contents so that everyone can understand. From topic research to content finalizing, I make every effort to achieve my intention. I carefully choose the topics that are useful for my readers, but rarely or not found on the internet. Therefore, my readers can learn something new by reading my unique contents.

When writing on a particular topic, the content organization is so much important. I often organize my contents in a way that the content efficiency is close to 100%. This can be defined as:

(Image by author)
(Image by author)

This is not a mathematical formula. However, you can get some intuition from it: It’s obvious that the content efficiency can be increased by allowing you to maximize the amount of knowledge you can take. For that, I often keep the following six guidelines in my mind when writing a data science post.

1. Explain the intuition behind algorithms

At the beginning of a data science post, most of you try to describe too much of the mathematics of a particular algorithm or process. The readers may not understand them and skip reading the rest of your content. My suggestion is that you should begin by explaining the intuition behind a particular algorithm and then move to the mathematical part. For example, here is a part of the explanation of the Elliptic envelope technique, directly taken from my "Two outlier detection techniques you should know in 2021" post.

The intuition behind the Elliptic Envelope is very simple. We draw an ellipse around the data points based on some criteria and classify any data point inside the ellipse as an inlier (green ones) and any observation outside the ellipse as an outlier (red ones) – By author

The intuition behind the Elliptic Envelope (Image by author)
The intuition behind the Elliptic Envelope (Image by author)

This can understand even for a beginner. Here, I do not try to explain mathematically, "how to draw the ellipse around the data points". Instead, I explained the intuition behind the technique. Now, my readers have an idea to use the Elliptic envelope technique to detect possible outliers.

Key takeaway: A combination of intuition and a related diagram is a fine recipe for explaining a piece of algorithm or process.

2. Don’t open (explain) black-box models

In the context of data science and machine learning, a black-box model refers to an internal structure of an algorithm in which we cannot see or understand how it works. We can only see the outputs for given inputs. A great example of this is neural network models. If we try to open (explain) such a model, our time will be wasted since it is hard to understand the complex mechanisms under the hood.

The opposite is the white-box model in which we can see or understand the internal mechanisms of algorithms. White-box models are transparent. An example of such a model is a decision tree.

A part of a decision tree: A white-box model (Image by author)
A part of a decision tree: A white-box model (Image by author)

In a decision tree, we can easily explain how the internal nodes are divided to make final outputs. The criteria are very clear. You can explain things like how a decision tree has made this particular output and so on.

Key takeaway: It is important to know when and how to use an algorithm rather than understanding its internal mechanisms. So, don’t try to open (explain) black-box models. Also, note that opening a white-box model is optional.

3. Pay attention to both interpretability and explainability of models

It may be useful to distinguish between interpretability and Explainability in the context of data science and machine learning. Both terms are used interchangeably. However, there is a clear difference.

When we try to interpret an ML model, we try to explain its meaning, constraints (limitations), assumptions and validity of predictions. The explainability mostly deals with something called transparency. Here, we try to explain the inner workings of the model (not possible with black-box models!) by giving more attention to the model training process and model performance improvement – By author

To explain this, I use the following Simple Linear Regression model.

y = 𝛽0 + 𝛽1x
y = Height(cm)
x = Age(years)

Imagine that the above model represents the relationship between Age and Height of a person. The interpretability of the above model has the following elements.

  • One can draw a scatterplot to see the nature of the relationship (linear/non-linear, positive/negative) between Age and Height.
  • If the relationship is linear, one can calculate and interpret the Pearson correlation coefficient to measure the strength of the relationship.
  • One can calculate and interpret the R-squared (R²) value to see how much variability in Height is captured by the model.
  • One can interpret 𝛽0 and 𝛽1 coefficients. 𝛽0 is the Height of a newborn baby (when, x=0). 𝛽1 is the increase in Height in cm for a unit increase in Age.
  • One can mention the limitations of the model. For example, one can say that, after a particular point of Age, a person’s Height is not increasing.
  • One can interpret and verify the assumptions of the model.

This is the interpretation part. Now, we move into the explainability part. The explainability of the above model has the following elements.

  • One can explain how to find the optimal values for 𝛽0 and 𝛽1 coefficients. One can explain the cost function and gradient descent process that finds the optimal values. One can explain how to choose the learning rate, α.
  • One can explain the cross-validation procedure that is used for model evaluation. One can explain the selection of size and number of folds.
  • One can explain the hyperparameter tuning process that is used to find the best possible model out of several models.
  • One can apply and explain some kind of regularization (L1, L2 or both L1 and L2) if the model overfits the data.

Key takeaway: Interpretability is required for all models. But, the explainability may be limited to white-box models and is optional.

4. Give a hands-on approach

This is another key thing that I consider when writing a data science post. Most of my posts give you a hands-on experience: You learn by doing. For that, I include all code samples and datasets. You can use them to practice what you learned.

5. Mention prerequisites

Mentioning prerequisites will be helpful for your readers. I suggest you include your own content as the prerequisite contents. Many of my posts are related to other posts. Basically, I start with fundamentals and go into the deep. When I write advanced contents, I often include the links to my own fundamental prerequisite contents. Here is an example:

(Image by author)
(Image by author)

6. Use some images and diagrams

A picture is worth thousands of words – Henrik Ibsen

This is also true for Data Science and Machine Learning. Here, a picture is not a statistical plot or a similar thing. It is just a simple diagram, but powerful enough for explaining complex theories and processes for readers. Here is an example:

(Image by author)
(Image by author)

I created this diagram to explain the hyperparameter tuning process with grid search in this post.


Conclusion

It is your responsibility to design valuable and understandable contents for your audience. You may consider these guidelines discussed today. You can also find some other ways by considering the readers’ feedback. In either way, your goal is to maximize the content efficiency as defined earlier.


This is the end of today’s post. My readers can sign up for a membership through the following link to get full access to every story I write and I will receive a portion of your membership fee.

Join Medium with my referral link – Rukshan Pramoditha

Thank you so much for your continuous support! See you in the next story. Happy learning to everyone! Don’t forget to have a look at some of my popular lists too:

(Screenshot by author)
(Screenshot by author)
(Screenshot by author)
(Screenshot by author)

Special credit goes to Christopher Burns on Unsplash, **** who provides me with a nice cover image for this post.

Rukshan Pramoditha 2021–10–06


Related Articles