Why is Gaussian the King of all distributions?

Significance of Gaussian distribution

Vidhi Chugh
Towards Data Science

--

Source: The bean machine is called the first generator of normal random variables

Gaussian Distribution and its key characteristics:

  • Gaussian distribution is a continuous probability distribution with symmetrical sides around its center.
  • Its mean, median and mode are equal.
  • Its shape looks like below with most of the data points clustered around the mean with asymptotic tails.
Source

Interpretation:

  • ~68% of the values drawn from normal distribution lie within 1𝜎
  • ~95% of the values drawn from normal distribution lie within 2𝜎
  • ~99.7% of the values drawn from normal distribution lie within 3𝜎

Where do we find the existence of Gaussian distribution?

ML practitioners or not, almost all of us have heard of this most popular form of distribution somewhere or the other. Everywhere we look around us, majority of the processes follow approximate Gaussian form, for e.g. age, height, IQ, memory, etc.

On a lighter note, there is one well-known example of Gaussian lurking around all of us i.e. ‘bell curve’ during appraisal time 😊

Yes, Gaussian distribution resonates with bell curve quite often and its probability density function is represented by the following mathematical formula:

probability density function of Gaussian distribution

Notation:

A random variable X with mean 𝜇 and variance 𝜎² is denoted as:

Random variable X following normal distribution

What is so special about the Gaussian distribution? Why do we find Gaussian almost everywhere?

Whenever we need to represent real valued random variables whose distribution is not known, we assume the Gaussian form.

This behavior is largely owed to Central Limit Theorem (CLT) which involves the study of sum of multiple random variables.

As per CLT: normalized sum of a number of random variables, regardless of which distribution they belong to originally, converges to Gaussian distribution as the number of terms in the summation increases.

An important point to note is that CLT is valid at a sample size of 30 observations i.e. sampling distribution can be safely assumed to follow Gaussian form, if we have a minimum sample size of 30 observations.

Therefore, any physical quantity that is sum of many independent processes is assumed to follow Gaussian. For e.g., “in a typical machine learning framework, there are multiple sources of errors possible — data entry error, data measurement error, classification error etc”. The cumulative effect of all such forms of error is likely to follow normal distribution”

Let’s check this using python:

Steps:

  • Draw n samples from exponential distribution
  • Normalize the sum of n samples
  • Repeat above steps N times
  • Keep storing the normalized sum in sum_list
  • In the end, plot the histogram of the normalized sum_list
  • The output closely follows Gaussian distribution, as shown below:
Normalized sum of 30 samples drawn from exponential distribution follow Gaussian distribution

Similarly, there are several other distributions like Student t distribution, chi-squared distribution, F distribution etc which have strong dependence on the Gaussian distribution. For e.g. t-distribution is a result of infinite mixture of Gaussians leading to longer tails as compared to a Gaussian one.

Properties of Gaussian Distribution:

1) Affine transformation:

It is a simple transformation of multiplying the random variable with a scalar ‘a’ and adding another scalar ‘b’ to it.

The resulting distribution is Gaussian with mean:

If X ~ N(𝜇, 𝜎²), then for any a,b ∈ ℝ,

a.X+b ~ N(a. 𝜇+b, a².𝜎²)

Note that not all transformations result into Gaussian, for e.g. square of a Gaussian will not lead to Gaussian.

2) Standardization:

If we have 2 sets of observations, each drawn from a normal distribution with different mean and sigma, then how do we compare the two observations to calculate the probabilities with respect to their population?

Hence, we need to convert the observations mentioned above into Z score. This process is called as Standardization which adjusts the raw observation with respect to its mean and sigma of the population it is generated from and brings it onto a common scale

Z score

3) Conditional distribution: An important property of multivariate Gaussian is that if two sets of variables are jointly Gaussian, then the conditional distribution of one set conditioned on the other set is again Gaussian

4) Marginal distribution of the set is also a Gaussian

5) Gaussian distributions are self-conjugate i.e. given the Gaussian likelihood function, choosing the Gaussian prior will result in Gaussian posterior.

6) Sum and difference of two independent Gaussian random variables is a Gaussian

Limitations of Gaussian Distributions:

  1. Simple Gaussian distribution fails to capture the below structure:
Mixture of Gaussians from Pattern Recognition and Machine Learning by Christopher Bishop

Such structure is better characterized by the linear combination of two Gaussians (also known as mixture of Gaussians). However, it's complex to estimate the parameters of such mixture of Gaussians.

2) Gaussian distribution is uni-modal, i.e. it fails to provide a good approximation to multi-modal distributions thereby restricting the range of distributions that it can represent adequately.

3) Degrees of freedom grow quadratically with an increase in the number of dimensions. This results in high computational complexity in inverting such large covariance matrix.

Hope the post gives you a sneak peek into the world of Gaussian distributions.

Happy Reading!!!

References:

--

--

Data Transformist and AI Strategist | International Speaker | AI Ethicist and Data-Centric Scientist | Global Woman Achiever https://allaboutscale.com/