To represent a dataset as a 1-number summary, we use central tendency measure. There exist three central tendency measures i.e. Mean, Median & Mode. Why was there a need for these three measures when only one (Mean) could have done the job? This is what this blog is all about, as this blog ends you will be able to answer the notorious question – Which one to choose & when? Since each one of them has its own pros and cons, the same will be elaborated to establish conceptual clarity.
Let’s begin with the visual representations to better interpret the concepts:
*Dataset used – Heights of seven Bodybuilders(Assumed Discrete Series)

Now we will calculate the central tendency of this data using Mean, Median, and Mode.
After the calculations, we will identify how each of these central tendency measures behaves when a new data-point is added to the data which will further enable us to understand the importance of each central tendency measure and application suitability for different conditions.
Let’s start with the calculation of the Mean of this data:
150, 160, 160, 170,155,180,175 – What do these numbers reflect?
If we try and put them on a number line, each point will be nothing but a distance from a reference point (in this case=0)


Initiating calculation of the Median of this data:
Steps:
- Arrange the data points in an ascending order
- Cross-section split, where half the data points lie on the upper side and the remaining half on the lower side, is the median measure. Think of this as you are trying to partition the data points into two-halves using a separator
- If the data points count is odd, then there is one central value lying on the separator which is the median itself else the average of the two points lying on either side of the separator is the median


For the given dataset, N is odd (7 data points) and as evident from the visual above, there are 50% (3) observations above and below the separator data point C (it can be B as well since both have the same value). So the median measure of this data is 160 cm.
Initiating calculation of the Mode of this data:
This is the easiest one to calculate, just determine the frequency of occurrence of each data point in the data, and the one with the highest frequency is the mode of the data. This measure can also be used when the data is non-numerical.

As there are two bodybuilders with 160 cm height, this implies the mode of this dataset will be 160 cm.

Now coming to the most important discussion, why was there a need to have these three measures of central tendency instead of just one?
To get a 1-number summary (Central Tendency), it is always intended to get an unbiased reflection of the whole data with that one measure. However, we will notice in the exercise below that sometimes Mean alone fails to stay unbiased and the measure is the wrong reflection of the data.
Continuing with the same data of bodybuilders, I will do one external data point addition and shall notify the changes in the central tendency measures:
I have added one bodybuilder (H) to the dataset (Height = 200 cm)

Mean Before = 164.3 cm (7 observations)
Mean After = 168.75 cm (8 observations)
Median Before = 160 cm (7 observations)
Median After = 162.5 cm (8 observations)
*Don’t forget to arrange the data in ascending order before median calculation
Mode Before = 160 cm (7 observations)
Mode After = 160 cm (8 observations)
Things to notice:
Mean is very sensitive to any large addition to the dataset in comparison to Median and Mode. Median changes a little and Mode not at all. By now you will start getting a hint as to why these measures were thought of and a plain simple answer is to serve as the best feasible alternates for dealing with their innate biases(we just demonstrated one example above).
Let’s conclude the pros and cons of all three of them:

I hope now you have a clear understanding of which central tendency measure to use and when. Ending this piece here, watch out this space for more upcoming blogs.
Thanks!!!