The world’s leading publication for data science, AI, and ML professionals.

Explain like I’m five: Linear regression

"How can we analyze our results from the Medium thumbnail experiment?"

Making the basic idea simple

My little sister was born when I was 10. We instantly became buddies, and I would spend many of my afternoons playing with her. When we grew a bit older we would experiment with cooking together. Later we would have various building and gaming projects. All in all, we’ve been a team for 18 years. And I’ve had the opportunity to share my knowledge with her.

A couple of days ago I suggested to her that we should test how an animated preview picture would affect the number of clicks on a Medium post. She was on board – she would make an animation, and I would test if it has a positive effect on the number of clicks. But, then she asked me: how can we test that?

I felt a sharp pain. I had neglected my duties as a big brother. Surely I should have taught my sister how to perform an experiment. A skill that allows such an advantage in everything we do. To redeem myself, I wrote her a basic explanation of how to analyze experiment results using Linear Regression.

This is my explanation for my little sister. With our animation experiment as an example.

1. What are we measuring

A little while back, I wrote my very first articles and published them on Medium. Getting my texts in front of the Medium readers turned out to be a bit harder than I had expected. Now I have more time on my hands, and I want to use part of that time to understand how to make Medium articles work.

With my sister, we are focusing on the preview picture of a post – also called a thumbnail. A good thumbnail should be relevant, true to the post, and encourage people to click on the post. An improved thumbnail should then increase the number of people clicking on the post. This seems quite logical so far.

Below is the original preview of an article I wrote. The picture of the girl makes for a rather generic, non-animated thumbnail.

Enjoy life to the fullest; remote efficiently

The number of clicks on the post is available to the writer, but it’s named differently. They are displayed as the number of views, so how many people viewed the post. This is what the writer sees about their views on a post:

These are some of the statistics a writer sees for their post. As you can see, we're just getting started. Image by the author.
These are some of the statistics a writer sees for their post. As you can see, we’re just getting started. Image by the author.

For our little test, it makes sense to only look at the views that saw the animated thumbnail. In our case, these will be the internal views. They are the readers that came from Medium, and they definitely can see animated thumbnails. With readers coming from other websites, like Facebook, it is not clear if they also see the animated thumbnail. Because we’re uncertain about them, it’s a good idea to count them out. Luckily, Medium already counts them as the external views, and they are easy to count out.

The internal views can see the animated thumbnail. Whereas, the external views might not see the
animated thumbnail. Image by the author.
The internal views can see the animated thumbnail. Whereas, the external views might not see the animated thumbnail. Image by the author.

2. How fast do we get views

So a better thumbnail will get views faster. How do we know which one is better?

The difference between the original and animated thumbnail could be easily tested. We can try both thumbnails for an equal amount of time, and compare which got more views in the end. But the original thumbnail has been there for 99 days. After another 99 days, we will likely have forgotten the whole experiment. So let’s try something else.

Instead, we can look at how fast the different thumbnails get views.

To know that, we need to know how many total views the article has gotten on each day. Luckily Medium tells us how many new views we get each day.

0 Views on April 11. Image by the author.
0 Views on April 11. Image by the author.
3 Views on April 12. Image by the author.
3 Views on April 12. Image by the author.
4 Views on April 13. Image by the author.
4 Views on April 13. Image by the author.

The total views by date is then a sum of all the previous days. More so, the sum of the views and the days since publishing can be drawn into a plot. Like this:

Sum of the internal views per day, using the original thumbnail. Image by the author.
Sum of the internal views per day, using the original thumbnail. Image by the author.

This is where we currently are. We know the number of views over time, but that’s about it. Not much interesting information there.

3. Comparing the thumbnails

Our original question is: does an animated thumbnail increase the amount of clicks on the post?

Currently, we do not know. It makes sense that it would, but we do not know until we give it a try. And after we try, we will have the number of views for the animated thumbnails. For now, I made up some numbers for the animated thumbnail – so that I can show the rest of the test. Here the numbers are drawn side by side.

Some made-up values for the animated thumbnail. Image by the author.
Some made-up values for the animated thumbnail. Image by the author.

From the above picture, we can see that the animated thumbnail has generated almost as many views as the original thumbnail. But, the animated thumbnail has been there only for 21 days.

So are we done? The animated thumbnail seems better, right?

We could stop here. But we can still figure out how much better the animated thumbnail is! That would be a motivating thing to know. And it’s also useful for deciding if we should use time on this.

To get a feel for the difference we change the plots a little bit. We add a straight line that tries to go neatly between the points. This line is the speed at which the post gets views. The steeper the line, the better.

For both of the results, I added a line to goes neatly between the points. Image by the author.
For both of the results, I added a line to goes neatly between the points. Image by the author.

We can even get a value for the steepness with some simple calculations. The steepness is equal to the number of days divided by the number of views added in those days. I added the values to the next picture. But, do check out the line that we added between the points. To make it fit well, it does not start from 0 views on day 0.

The plot on the right has a calculation for the steepness. The red text tells us how steep the line is - which equals
"about how many views we get each day". Image by the author.
The plot on the right has a calculation for the steepness. The red text tells us how steep the line is – which equals "about how many views we get each day". Image by the author.

To get a value for the improvement, we divide the steepness of the animated thumbnail line with the steepness of the original thumbnail. With this fictional data, the animated thumbnail would be 0.32/0.08 = 4 times better. So having an animated thumbnail would be useful : )

These are the steps we did in the end. Add the data, add the line, and calculate the steepness of the line. Image by the author.
These are the steps we did in the end. Add the data, add the line, and calculate the steepness of the line. Image by the author.

This, my little sister (and other readers), is how we can analyze experiment results.

It’s not the only way, and it’s not always the best way. But, it’s easy and the steps are easy to connect to real life. Many things are happening behind the scenes, but understanding them is not necessary when you’re getting started. More so, I’m always open to help.


Related Articles