Central Limit & Large Numbers
Curious how the Central Limit Theorem and the Law of Large Numbers work and relate to each other? Come and take a look together with me!
If you study some probability theory and statistics you come across two theorems that stand out:
- CLT, which is short for Central Limit Theorem
- LLN, which is short for Law of Large Numbers
Because they are seemingly of such great importance in the field I wanted to make sure I understand the substance of both of them. Reading various articles about them, I believe, got me there, but doing so was also a bit confusing in the meantime as there are many subtle details in the concepts their statements are based on. Additionally, they are actually not just two distinct theorems but each one comes in several versions.
In an attempt to wrap my head around them and ultimately commit their essence to my long-term memory, I have applied three techniques:
- Reduce their varieties to the easiest and practically most relevant case
- By exploiting the confinement to the easiest case, find representations of both with a maximum of commonality and a clear difference
- Write this article about it;-)
Let me start by giving you loose and informal statements of the theorems in simple terms.
Roughly, the easiest version of the CLT tells us this:
Carry out identical but independent experiments each yielding a random number and add up all those numbers. If you repeat this process of coming up with a sum of random numbers, the frequencies of resulting sums will approximately follow a normal distribution (i.e. a Gaussian bell curve). The more numbers you sum per experiment (and the more experiments), the better the approximation.
Likewise, the easiest version of the LLN states:
Carry out identical but independent experiments each yielding a random number and average all those numbers. The more experiments you perform, the more likely will the average be close to the expected value (of the experiment).
Albeit simple let’s look at the charted outcome of an example simulated in Python: We throw an ordinary die 100 times and add up all the numbers. Then we repeat this process 10.000 times. For the CLT we record the relative frequencies of the different resulting sums in a distribution chart and compare the curvature with the graph of the Normal Distribution’s density function. For the LLN we compute averages from the growing total sum and finally compare those averages with the expected value 3.5.
In another simulation we want to see how the variation of the number of repeated experiments and the number of random numbers summed up per experiment influences the result stated by the CLT. As visualization we plot the resulting distributions in a standardized way (mean 0, standard deviation 1) for easier comparison and use green instead of blue for every other bar to better see their widths:
If you’re into math equations, let us now turn to formal representations of the theorems in order to understand their claims and the relationship between the two a bit more precisely.
Central Limit Theorem
Let
be independent and identically distributed random variables with expected value μ and finite variance σ². Then
converges towards the Standard Normal Distribution in distribution. Convergence in distribution means the cumulative distribution functions (CDF) of the random variables converge towards a limiting one — in this case they converge towards the CDF Φ of the Standard Normal Distribution as n grows large:
This is known as the Lindeberg/Lévy version of the CLT.
Law of Large Numbers
Again, let
be independent and identically distributed random variables with expected value μ and finite variance σ². Then
or written differently with
it becomes
This is known as Tschebyscheff’s version of the Weak Law of Large Numbers (as said there are other versions, too). The first limit equation is more suitable for the comparison with the CLT, the latter is more appropriately capturing the intuition of approximating the expected value with the average.
Similarities
As you can see by pattern matching the limit term of the CLT
and the first variant of the limit terms of the LLN from above,
they are very similar indeed (differences colored blue).
Both make a statement about the probability of getting a value of their expression within an arbitrary bounding box around 0, i.e. ]−ϵ,ϵ[. Further, both compute the same sum of the same random variables, both need to be centered by subtracting the increasing expected value from the sum and both need to be shrunk by a growing factor to make the convergence technically work out.
Differences
The difference of the two limit terms compared is obviously in the order of the denominators (√n vs. n) and the resulting limits on the right-hand sides: 1−2Φ(−ϵ) vs. 1. So basically, both deal with the same process of producing aggregate numbers that become more and more closely normally distributed around the mean of zero as n gets larger. But since we shrink the values in the LLN more aggressively, their variance approaches zero rather than remaining constant and that condenses all the probability over the expected value asymptotically.
By applying the following basic rules for variances
we can check that this is so:
Lastly, from another simulation we get a visualization of the variance reduction as n gets larger in the averaging process described by the LLN:
Bottom Line
When confining the two theorems to a special yet important case, namely that of independent identically distributed random variables with finite expectation and variance, and further restricting the comparison to the Weak LLN, we can spot similarities and differences that help deepening our understanding of the two in conjunction.