The world’s leading publication for data science, AI, and ML professionals.

Need for measures of Spread – Range, Variance, Standard Deviation

When the central tendency isn't enough to distinguish the datasets!

Photo by Fidel Fernando on Unsplash
Photo by Fidel Fernando on Unsplash

If someone gave you the 1-number summary (central tendency) of the below shown five datasets, in your mind, you would have thought they all are the same since their means are the same but when you plot each Data point of each set and compare them visually you shall realize that there should exist a measure to detect this distinguishing pattern as well.

(Image by author)
(Image by author)

Say "hi" to the spread measure – Range/Variance/Standard Deviation


Taking reference to the same data used in the previous blog:

Mean, Median & Mode – Which central tendency measure to use & when?

Range

The quickest spread measurer of a data set is Range. We take the maximum and minimum value out of the data set and subtract to get the range.

(Image by author)
(Image by author)

(*Data assumed as the population data for this blog)

Variance

For this measure, we will have to go back for reference, let’s recollect some thoughts previously mentioned. As stated earlier these numbers are nothing but distance measures from the origin when plotted on the number line.

(Image by author)
(Image by author)

Let’s take one more dataset with 7 data points of the same value (i.e. Mean of our parent data set).

(Image by author)
(Image by author)

Parent Data Set = 150,155,160,160,170,175,180 (all in cm)

New Data Set = 164.3, 164.3, 164.3, 164.3, 164.3, 164.3, 164.3 (all in cm)

Now we will calculate the average of squared distances from the origin for both data sets:

Difference = 27092.85–26994.49 = 98.36 cm2

There is an evident difference between these two measures of two data sets but what if we change the reference from origin to the mean, let’s find out:

You too will agree now that things are better in terms of reference and output value, visualizing them again will lead to further clarity:

(Image by author)
(Image by author)

Standard Deviation

You might have noticed that the unit of the output of Variance is cm2 & if we want a measure of spread having a similar mean reference but with original units (cm), then all we need to do is take the square root of Variance. This is what standard deviation is.

  • Note – All the calculations done above are assuming dataset as population set. The most discussed & debated topic related to the denominator of such measures i.e. why it is N-1 for the samples and why N for the population will be explained later in the upcoming blogs.

With the 2-number summary (Central Tendency & Spread), we can better distinguish among datasets and remember every statistical measure has a purpose to serve. These measures are existing to capture details that were not being captured with already available measures at hand. In the upcoming blogs, you will notice the need for capturing further extractable details using more statistical measures. With this, I end this chewable bite here, looking forward to sharing more blogs in the future.

Thanks!!!


Related Articles