Data Science

Central Limit Theorem In Action

And examples from its practical applications

Ceren Iyim

Published in

Towards Data Science

5 min readMar 11, 2020

Image Credits: Casey Dunn & Creature Cast on Vimeo

Statistics is a must-have knowledge for a data scientist. Central Limit Theorem is the cornerstone of it.

I learn better when I see any theoretical concept in action. I believe there are more people like me out there, so I will explain Central Limit Theorem with a concrete and catchy example today — hoping to make it permanent in your mind for your use.

Do you remember what is the average life expectancy in the world today? (Hint: check out here if you don’t)

It is 73.

This is a number summarizing the life expectancy data all over the world, to be precise 186 countries. This number will tell us more if we put it into the full context.

The context is comprised of statistical terms, so let’s first understand them:

Population: The set that contains all data of elements, individuals or measurements from your experimenting space. Our experimenting space is world and population is the life expectancy data from all countries — population size is denoted by N.

Sample: It is a randomly selected subset from the population — sample size is denoted by n.

Distribution: It describes the data/population/sample range and how data is spread in that range. Distribution of life expectancy in the world as of 2018, is as follows:

Mean: Average value of all data from your population or sample. For this example, population mean is 73 — denoted by µ (Mu) for populations and x̄ (x bar) for samples.

Standard Deviation: Not all countries’ average life expectancy is 73. They are distributed in a range as we see above. Standard deviation is a measure of how spread your population is — denoted by σ (Sigma).

Normal Distribution — Bell-Shaped Curve — Gaussian Distribution: When your population is spread perfectly symmetrical with σ standard deviations around the mean value, you get the following bell-shaped curve:

We scratched the surface of the necessary statistical terms to be used in explaining Central Limit Theorem, now let’s see the theorem in action!

What kind of distribution is average life expectancy in the world as of 2018? Is it normally distributed or is it some other distribution?

Well, it is not a perfectly normal distribution, since it seems to be pulled to the left side, known as left-skewed distribution.

Now let’s take some samples from the population. In other words, let’s take 1000 subsets of size 150 from the population and plot the distribution of each sample’s mean (x̄). This process is called the sampling distribution:

This is a perfect normal distribution! Actually, any sampling distribution from any population shows normal distribution because Central Limit Theorem states that:

Regardless of the initial shape of the population distribution, sampling distribution will approximate to a normal distribution. As the sample size increases, sampling distribution will get narrower and more normal.

This video explains the theorem unexpectedly interesting way using rabbits and dragon wings 🙂

Video Credits: Central Limit Theorem explained by Creature Cast & Casey Dunn on Vimeo

Coming back to our explanation of the Central Limit Theorem, it allows us to calculate the standard deviation — known as the standard error — of the sampling distribution with the following formula:

Standard Error Formula where *σ is the standard deviation and n is the sample size.* *Image Source*

Let’s look if this calculation also holds for our population and sampling distribution created:

As expected it holds! The intuitive explanation of this formula is; as sample size increases, the sampling distribution will become a more accurate representation of the population parameters (mean and standard deviation).

Bonus Theorem: Did you recognize where the center of the sampling distribution is? Do you think that it is a coincidence that both the sampling distribution and population mean is 73?

Of course not! This is explained by another fundamental theorem in statistics: Law of large numbers. As the more samples are drawn from a population and as the sample size increases, the mean of the sampling distribution will get closer to the population mean.

Practical Applications of Central Limit Theorem

In any machine learning problem, the given dataset represents a sample from the whole population. Using this sample, we try to catch the main patterns in the data. Then, we try to generalize the patterns in the sample to the population while making the predictions. Central limit theorem helps us to make inferences about the sample and population parameters and construct better machine learning models using them.

Moreover, the theorem can tell us whether a sample possibly belongs to a population by looking at the sampling distribution. The power of the theorem lies here in too.

When I typed the first words for this blog post, I intended to write about statistical significance and hypothesis testing but I realized that Central Limit Theorem is at the heart of explaining them. So, I will leave them to the next article. Follow my Medium account to learn about them 😉

Thanks for reading! If you want to have a closer look at the “action” of the theorem, you can check out this Deepnote Notebook. Source of my catchy example is from the Gapminder’s Open Numbers which can be found here.

If you enjoyed this one, you can view some of my other articles here:

For comments or constructive feedback, you can reach out to me on responses, Twitter or Linkedin!

Sources:

A Gentle Introduction to the Central Limit Theorem for Machine Learning

The central limit theorem is an often quoted, but misunderstood pillar from statistics and machine learning. It is…

machinelearningmastery.com