The world’s leading publication for data science, AI, and ML professionals.

Fooled by Standard Deviation

Intuitive Value and an Alternative Measure of Uncertainty

Photo by Robert Ruggiero on Unsplash
Photo by Robert Ruggiero on Unsplash

Last time, I looked at the statistical average with respect to the UK income distribution to illustrate the potential weaknesses of the mean. For a recap, you can see it here. This time, we’ll be taking a look at standard deviation and an alternative that can be used instead depending on the situation, the mean absolute deviation.

A Common Misconception

When asked what standard deviation describes, many would say that it gives the average distance of samples from the mean. Well, let’s consider a simple example. Suppose we take a 5 people and get their annual incomes: {$15k, $27k, $34k, $50k, $90k}. Now, intuitively, how do we measure the average deviation of income away from the mean? First, we need to calculate the sample mean:

Image by Author
Image by Author

Next, we need to get the mean distance of each sample point from the above sample mean:

Image by Author: Note terms in brackets are arranged to always be positive!
Image by Author: Note terms in brackets are arranged to always be positive!

Looks good! In fact, what we have just calculated here is the mean absolute deviation, which clearly describes the original definition. The sample statistic formula is given below:

Image by Author: Sample mean absolute deviation with N samples defined by x_i and mean hat{x}
Image by Author: Sample mean absolute deviation with N samples defined by x_i and mean hat{x}

Now let’s take a look at the sample standard deviation formula:

Image by Author: Sample standard deviation with N samples defined by x_i and mean hat{x}
Image by Author: Sample standard deviation with N samples defined by x_i and mean hat{x}

Right off the bat, we can see this doesn’t seem to look anything like the mean absolute deviation calculation. It should also be noted that for simplicity, we are using a biased estimator by dividing by N rather than N-1:

Image by Author
Image by Author

The result we get is a very different number, we’re over 20% higher! So what happened? The difference comes from how both calculations assert positive deviations. For the standard deviation, we square the difference to make it positive and then take the square root instead of just taking the absolute value of the difference. However, this creates a bias for outliers since large numbers squared become much larger themselves, such that the standard deviation of fat tailed distributions does not reflect at all the average deviation from the mean. On the other hand, when dealing with thin tailed distributions, the bias is small and the standard deviation does conform better to an approximation of the average distance of samples away from the mean, but it still doesn’t quite fit exactly (e.g. Gaussian).

Redefining Standard Deviation

So far, we have seen that if you want a metric that tells you the average distance of samples from the mean, you should use the mean absolute deviation. So, what does the standard deviation tell us? Well, it gives an intuitive value for the spread of the distribution as a whole. You might be thinking this sounds the same as the average distance away from the mean, but there’s a subtle difference. Now, we can discern the arrangement of samples within a set and how disparate they are from one another. Let’s consider another example. Suppose there are two sets of numbers, {1, 1, 7} and {0, 2, 7}. Both of these sets have a mean of 3 and a mean absolute deviation of 2.67. However, the first set has a standard deviation of 2.83 whilst the second set has a standard deviation of 2.94. Intuitively, the second set has an overall wider spread of values. This is captured by the standard deviation, but lost with the mean absolute deviation.

Standard deviation also has massively useful mathematical properties through the squaring operator, which forms the basis of a lot of Statistics and probability theory. A great example is how we can simplify the variance of a random variable:

Image by Author: Simplification of the variance, note that standard deviation is simply the square root of variance!
Image by Author: Simplification of the variance, note that standard deviation is simply the square root of variance!

Finally, it was shown that standard deviation is actually more efficient than mean absolute deviation for perfect Gaussian distributions. However, as soon as outliers and fat tails begin to creep in, this rapidly breaks down, and it could be argued that in realistic scenarios mean absolute deviation is actually more efficient.


Once again, it’s important to understand how metrics and statistics are derived before using them to draw conclusions. Be sure to understand the distribution of your data and precisely what information you are trying to convey before quoting a deviation statistic.

If you found this article useful, do consider:

Promotions out the way, I hope this article has been an interesting read, and do let me know your thoughts!!


Related Articles