The world’s leading publication for data science, AI, and ML professionals.

Mean, Median & Mode – Which central tendency measure to use & when?

To represent data as a 1-number summary, we use central tendency measure to do so. There exist three central tendency measures i.e. Mean…

To represent a dataset as a 1-number summary, we use central tendency measure. There exist three central tendency measures i.e. Mean, Median & Mode. Why was there a need for these three measures when only one (Mean) could have done the job? This is what this blog is all about, as this blog ends you will be able to answer the notorious question – Which one to choose & when? Since each one of them has its own pros and cons, the same will be elaborated to establish conceptual clarity.

Let’s begin with the visual representations to better interpret the concepts:

*Dataset used – Heights of seven Bodybuilders(Assumed Discrete Series)

(Image by author)
(Image by author)

Now we will calculate the central tendency of this data using Mean, Median, and Mode.

After the calculations, we will identify how each of these central tendency measures behaves when a new data-point is added to the data which will further enable us to understand the importance of each central tendency measure and application suitability for different conditions.

Let’s start with the calculation of the Mean of this data:

150, 160, 160, 170,155,180,175 – What do these numbers reflect?

If we try and put them on a number line, each point will be nothing but a distance from a reference point (in this case=0)

(Image by author)
(Image by author)
(Image by author)
(Image by author)

Initiating calculation of the Median of this data:

Steps:

  1. Arrange the data points in an ascending order
  2. Cross-section split, where half the data points lie on the upper side and the remaining half on the lower side, is the median measure. Think of this as you are trying to partition the data points into two-halves using a separator
  3. If the data points count is odd, then there is one central value lying on the separator which is the median itself else the average of the two points lying on either side of the separator is the median
(Image by author)
(Image by author)
(Image by author)
(Image by author)

For the given dataset, N is odd (7 data points) and as evident from the visual above, there are 50% (3) observations above and below the separator data point C (it can be B as well since both have the same value). So the median measure of this data is 160 cm.

Initiating calculation of the Mode of this data:

This is the easiest one to calculate, just determine the frequency of occurrence of each data point in the data, and the one with the highest frequency is the mode of the data. This measure can also be used when the data is non-numerical.

(Image by author)
(Image by author)

As there are two bodybuilders with 160 cm height, this implies the mode of this dataset will be 160 cm.

(Image by author)
(Image by author)

Now coming to the most important discussion, why was there a need to have these three measures of central tendency instead of just one?

To get a 1-number summary (Central Tendency), it is always intended to get an unbiased reflection of the whole data with that one measure. However, we will notice in the exercise below that sometimes Mean alone fails to stay unbiased and the measure is the wrong reflection of the data.

Continuing with the same data of bodybuilders, I will do one external data point addition and shall notify the changes in the central tendency measures:

I have added one bodybuilder (H) to the dataset (Height = 200 cm)

(Image by author)
(Image by author)

Mean Before = 164.3 cm (7 observations)

Mean After = 168.75 cm (8 observations)


Median Before = 160 cm (7 observations)

Median After = 162.5 cm (8 observations)

*Don’t forget to arrange the data in ascending order before median calculation


Mode Before = 160 cm (7 observations)

Mode After = 160 cm (8 observations)

Things to notice:

Mean is very sensitive to any large addition to the dataset in comparison to Median and Mode. Median changes a little and Mode not at all. By now you will start getting a hint as to why these measures were thought of and a plain simple answer is to serve as the best feasible alternates for dealing with their innate biases(we just demonstrated one example above).

Let’s conclude the pros and cons of all three of them:

(Image by author)
(Image by author)

I hope now you have a clear understanding of which central tendency measure to use and when. Ending this piece here, watch out this space for more upcoming blogs.

Thanks!!!


Related Articles