Central Limit & Large Numbers

Published in

Towards Data Science

6 min readNov 5, 2019

--

Curious how the Central Limit Theorem and the Law of Large Numbers work and relate to each other? Come and take a look together with me!

If you study some probability theory and statistics you come across two theorems that stand out:

CLT, which is short for Central Limit Theorem
LLN, which is short for Law of Large Numbers

Because they are seemingly of such great importance in the field I wanted to make sure I understand the substance of both of them. Reading various articles about them, I believe, got me there, but doing so was also a bit confusing in the meantime as there are many subtle details in the concepts their statements are based on. Additionally, they are actually not just two distinct theorems but each one comes in several versions.

In an attempt to wrap my head around them and ultimately commit their essence to my long-term memory, I have applied three techniques:

Reduce their varieties to the easiest and practically most relevant case
By exploiting the confinement to the easiest case, find representations of both with a maximum of commonality and a clear difference
Write this article about it;-)

Bell curves arise from frequencies of sums of random numbers

Let me start by giving you loose and informal statements of the theorems in simple terms.

Roughly, the easiest version of the CLT tells us this:

Carry out identical but independent experiments each yielding a random number and add up all those numbers. If you repeat this process of coming up with a sum of random numbers, the frequencies of resulting sums will approximately follow a normal distribution (i.e. a Gaussian bell curve). The more numbers you sum per experiment (and the more experiments), the better the approximation.

Likewise, the easiest version of the LLN states:

Carry out identical but independent experiments each yielding a random number and average all those numbers. The more experiments you perform, the more likely will the average be close to the expected value (of the experiment).

Albeit simple let’s look at the charted outcome of an example simulated in Python: We throw an ordinary die 100 times and add up all the numbers. Then we repeat this process 10.000 times. For the CLT we record the relative frequencies of the different resulting sums in a distribution chart and compare the curvature with the graph of the Normal Distribution’s density function. For the LLN we compute averages from the growing total sum and finally compare those averages with the expected value 3.5.

Left Plot: Distribution of Sums (blue) approximating the theoretical distribution of the limiting case (red) — Right Plot: Averages (blue) approximating the Expected Value 3.5 (red) — Find the code at https://gist.github.com/BigNerd/04aef94af57f72d4d94f13be3f7fde70

In another simulation we want to see how the variation of the number of repeated experiments and the number of random numbers summed up per experiment influences the result stated by the CLT. As visualization we plot the resulting distributions in a standardized way (mean 0, standard deviation 1) for easier comparison and use green instead of blue for every other bar to better see their widths:

Summing more numbers yields finer resolution along the x-axis, repeating the experiment more often gives better accordance with the red Gaussian curve along the y-axis — Find the code at https://gist.github.com/BigNerd/63ad5e3e85e0b676b0db61efddedf839

If you’re into math equations, let us now turn to formal representations of the theorems in order to understand their claims and the relationship between the two a bit more precisely.

Central Limit Theorem

Let

be independent and identically distributed random variables with expected value μ and finite variance σ². Then

converges towards the Standard Normal Distribution in distribution. Convergence in distribution means the cumulative distribution functions (CDF) of the random variables converge towards a limiting one — in this case they converge towards the CDF Φ of the Standard Normal Distribution as n grows large:

This is known as the Lindeberg/Lévy version of the CLT.

Law of Large Numbers

Again, let

be independent and identically distributed random variables with expected value μ and finite variance σ². Then

or written differently with

it becomes

This is known as Tschebyscheff’s version of the Weak Law of Large Numbers (as said there are other versions, too). The first limit equation is more suitable for the comparison with the CLT, the latter is more appropriately capturing the intuition of approximating the expected value with the average.

Similarities

As you can see by pattern matching the limit term of the CLT

and the first variant of the limit terms of the LLN from above,

they are very similar indeed (differences colored blue).

Both make a statement about the probability of getting a value of their expression within an arbitrary bounding box around 0, i.e. ]−ϵ,ϵ[. Further, both compute the same sum of the same random variables, both need to be centered by subtracting the increasing expected value from the sum and both need to be shrunk by a growing factor to make the convergence technically work out.

Differences

The difference of the two limit terms compared is obviously in the order of the denominators (√n vs. n) and the resulting limits on the right-hand sides: 1−2Φ(−ϵ) vs. 1. So basically, both deal with the same process of producing aggregate numbers that become more and more closely normally distributed around the mean of zero as n gets larger. But since we shrink the values in the LLN more aggressively, their variance approaches zero rather than remaining constant and that condenses all the probability over the expected value asymptotically.

By applying the following basic rules for variances

we can check that this is so:

Lastly, from another simulation we get a visualization of the variance reduction as n gets larger in the averaging process described by the LLN:

Averaging more numbers per experiment reduces the variance of the averages around the true mean — Find the code at https://gist.github.com/BigNerd/ad880941bbe6987e6621cf99e3b2af78

Bottom Line

When confining the two theorems to a special yet important case, namely that of independent identically distributed random variables with finite expectation and variance, and further restricting the comparison to the Weak LLN, we can spot similarities and differences that help deepening our understanding of the two in conjunction.