On Average, You’re Using the Wrong Average: Geometric & Harmonic Means in Data Analysis

When the Mean Doesn’t Mean What You Think it Means

Daniel McNichol
Towards Data Science

--

PREFACE

(skip this if you already grok “central tendency”)

Comparison of the arithmetic, geometric and harmonic means of a pair of numbers (via Wikipedia)

It’s probably the most common data analytic task:

You have a bunch of numbers. You want to summarize them with fewer numbers, preferably a single number. So you add up all the numbers then divide the sum by the total number of numbers. Boom: behold the “average”, right?

Maybe.

Contrary to popular belief, average isn’t actually a thing, mathematically speaking. Meaning: there is no mathematical operation properly called the “average”. What we usually mean by average is “arithmetic mean”, the well-known operation described above. We call this “average” because we expect it to conform to the colloquial definition of “average”: a typical, ‘normal’ or middle value. Often we’re correct, but less often than we think.

Summary Statistics

The arithmetic mean is just one among many ways of arriving at an “average” value. More technically, these are known as “summary statistics”, “measures of central tendency” or “measures of location”.

Probably the 2nd most famous summary statistic is the median, the literal middle value of a dataset (which, as such, is often more “average” than the mean). I won’t discuss this here, but suffice to say that the arithmetic mean is overused in many cases when the median is more appropriate. Further reading here, here & here (last one overlaps a bit with the rest of this article, and is very good).

This article will focus on two lesser known measures: the geometric & harmonic means.

Part I develops a conceptual, intuitive & practical understanding of how they work & when to use them.

Part II is a separate post & gets a bit deeper & more technical, demonstrating their respective dynamics with R code, real & simulated data & plots.

I. Pythagorean Means

A geometric construction of the Quadratic and Pythagorean means (of two numbers a and b). via Wikipedia

The arithmetic mean is just 1 of 3 ‘Pythagorean Means’ (named after Pythagoras & his ilk, who studied their proportions). As foretold, the geometric & harmonic means round out the trio.

To understand the basics of how they function, let’s work forward from the familiar arithmetic mean.

Arithmetic Mean

The arithmetic mean is appropriately named: we find it by adding all of the numbers in the dataset, then dividing by however many numbers are in the dataset (in order to bring the sum back down to the scale of the original numbers).

3 + 8 + 10 = 21
21 ÷ 3 = 7
Arithmetic mean = 7

Notice, what we are essentially saying here is: if every number in our dataset was the same number, what number would it have to be in order to have the same sum as our actual dataset?

But there’s nothing particularly special about addition. It’s just one rather simple mathematical operation. The arithmetic mean works well to produce an “average” number of a dataset when there is an additive relationship between the numbers. Such a relationship is often called “linear”, because when graphed in ascending or descending order the numbers tend fall on or around a straight line. A simple idealized example would be a dataset where each number is produced by adding 3 to the previous number:

1, 4, 7, 10, 13, 16, 19…

The arithmetic mean thus gives us a perfectly reasonable middle value:

(1 + 4 + 7 + 10 + 13 + 16 + 19) ÷ 7 = 10

But not all datasets are best described by this relationship. Some have a multiplicative or exponential relationship, for instance if we multiplied each consecutive number by 3 rather than adding by 3 as we did above:

1, 3, 9, 27, 81, 243, 729…

This produces what is known as a geometric series (hint hint). When plotted in order, these numbers resemble more of a curve than a straight line.

In this situation, the arithmetic mean is ill-suited to produce an “average” number to summarize this data.

(1 + 3 + 9 + 27 + 81 + 243 + 729) ÷ 7 = 156.1

156 isn’t particularly close to most of the numbers in our dataset. In fact it’s more than 5x the median (middle number), which is 27.

This skew is more apparent when the data is plotted on a flat number line:

So what to do?

Introducing…

The Geometric Mean

Since the relationship is multiplicative, to find the geometric mean we multiply rather than add all the numbers. Then to rescale the product back down to the range of the dataset, we have to take the root, rather than simply dividing. You remember the square root: the number that needs to be squared to arrive at our number of interest.

Square root of 25 = 5, because 5 * 5 = 25

This is the same idea, but rather than raising to the second power (aka ‘squaring’), we need to find the number that would be raised to the 7th power to produce our product, because there are 7 numbers in our dataset, which we multiplied together. This is generally known as the nth root, where n is the size of the dataset. Thus, we need to find the 7th root.

Notice, what we are saying here is: if every number in our dataset was the same number, what number would it have to be in order to have the same multiplicative product as our actual dataset?

So, the geometric mean of our dataset is:

1 * 3 * 9 * 27 * 81 * 243 * 729 = 10,460,353,203
7th root of 10,460,353,203 = 27
geometric mean = 27

And on the number line:

In this case, our geometric mean very much resembles the middle value of our dataset. In fact, it is equivalent to the median.

Note: the geometric mean will not always equal the median, only in cases where there is an exact consistent multiplicative relationship between all numbers (e.g. multiplying each previous number by 3, as we did). Real world datasets rarely contain such exact relationships, but for those that approximate this sort of multiplicative relationship, the geometric mean will give a closer ‘middle number’ than the arithmetic mean.

Real World Applications of the Geometric Mean

It turns out that there are many practical uses for the geometric mean, as multiplicative-ish relationships abound in the real world.

One canonical example is:

Compound Interest

Assume we have $100,000 that accrues a varying rate of interest each year for 5 years:

annual interest rates: 1%, 9%, 6%, 2%, 15%

We’d like to take a shortcut to find our average annual interest rate, & thus our total amount of money after 5 years, so we try to “average” these rates:

(.01 + .09 + .06 + .02 + .15) ÷ 5 = .066 = 6.6%

Then we insert this average % into a compound interest formula:

Total interest earned = $100,000 * (1.066⁵ - 1) = $37,653.11
Interest + principal = $37,653.11 + 100,000 = $137,653.11
Final total = $137,653.11

Just to be sure we’re not fooling ourselves, let’s do this the long way & compare results:

Year 1: 100,000 + (100,000 * .01) = 100,000 * 1.01 = $101,000
Year 2: 101,000 * 1.09 = $110,090
Year 3: 110,090 * 1.06 = $116,695.40
Year 4: 116,695.40 * 1.02 = $119,029.31
Year 5: 119,029.31 * 1.15 = $136,883.70
Actual final total = $136,883.70

What happened? Our shortcut overestimated our actual earnings by nearly $1,000.

We made a common error: We applied an additive operation to a multiplicative process, & got an inaccurate result.

Now let’s try again with the geometric mean:

1.01 * 1.09 * 1.06 * 1.02 * 1.15 = 1.368837042
5th root of 1.368837042 = 1.064805657
Geometric mean = 1.064805657

(Technical Note: we have to use 1 + interest rate as inputs in the geometric mean calculation because those are the actual factors that are multiplied with the principal values to produce the amount of interest accrued at each period, and we need to find the average of these factors. This has the added benefit of avoiding negative numbers even when there is a negative rate, which the geometric mean equation can’t handle [it also can’t handle 0s]. The arithmetic mean doesn’t have this issue. It’s the same whether we use the interest rates themselves or 1 + interest rate as input [then subtract 1 from the result], because it is additive rather than multiplicative. But the geometric mean will be different, and wrong, if we don’t add 1.)

Plugging the geometric mean of the interest rates into our compound interest formula:

Total interest earned = $100,000 * (1.0648⁵ - 1) = $36,883.70
Interest + principal = $36,883.70 + 100,000 = $136,883.70
Final total = $136,883.70 exactly the same as the long method above

That’s more like it.

We used the right mean for the right job & got the right result.

What else is the geometric mean good for?

Different Scales or Units

A fancy feature of the geometric mean is that you can actually average across numbers on completely different scales.

For instance, we want to compare online ratings for two coffeeshops using two different sources. The problem is that source 1 uses a 5-star scale & source 2 uses a 100-point scale:

Coffeeshop A
source 1 rating: 4.5
source 2 rating: 68

Coffeeshop B
source 1 rating: 3
source 2 rating: 75

If we naively take the arithmetic mean of raw ratings for each coffeeshop:

Coffeeshop A = (4.5 + 68) ÷ 2 = 36.25
Coffeeshop B = (3 + 75) ÷ 2 = 39

We’d conclude that Coffeeshop B was the winner.

If we were a bit more number-savvy, we’d know that we have to normalize our values onto the same scale before averaging them with the arithmetic mean, to get an accurate result. So we multiply the source 1 ratings by 20 to bring them from a 5-star scale to the 100-point scale of source 2:

Coffeeshop A
4.5 * 20 = 90
(90 + 68) ÷ 2 = 79

Coffeeshop B
3 * 20 = 60
(60 + 75) ÷ 2 = 67.5

So we find that Coffeeshop A is the true winner, contrary to the naive application of arithmetic mean above.

The geometric mean, however, allows us to reach the same conclusion without having to fuss over the scale or units of measure:

Coffeeshop A = square root of (4.5 * 68) = 17.5
Coffeeshop B = square root of (3 * 75) = 15

Et voilà!

The arithmetic mean is dominated by numbers on the larger scale, which makes us think Coffeeshop B is the higher rated shop. This is because the arithmetic mean expects an additive relationship between numbers & doesn’t account for scales & proportions. Hence the need to bring numbers onto the same scale before applying the arithmetic mean.

The geometric mean, on the other hand, can handle varying proportions with ease, due to it’s multiplicative nature. This is a tremendously useful property, but notice what we lose: We no longer have any interpretable scale at all. The geometric mean is effectively unitless in such situations.

I.e. the geometric means above are not 17.5 'out of' 100 points nor 15 'out of' 5 stars. They are just unitless numbers, in relative proportion to each other. (Technically, their scale is the geometric mean of the original scales, 5 & 100, which is 22.361). This can be a problem if we actually want to interpret the results relative to some scale that is meaningful to us, such as the original 5 or 100-point systems. But if we just want to know the relationship between ratings of the two coffeeshops, we’re good to go.

UPDATE 7/14/18: As pointed out by Mladen Fernežir, there is no guarantee that the geometric mean will always preserve the ordering of the arithmetic mean on scaled or normalized values, much less be proportionate to it, as I originally indicated. Rather, it is simply a different way to summarize the relationship between different sets of numbers (albeit one that will often produce more ‘credible’ summaries of values on different scales). So again, care & critical thought are necessary to its application.

Geometric Mean Recap
To tl;dr:

  • The geometric mean multiplies rather than sums values, then takes the nth root rather than dividing by n
  • It essentially says: if every number in our dataset was the same number, what would that number have to be in order to have the same multiplicative product as our actual dataset?
  • This makes it well-suited for describing multiplicative relationships, such as rates & ratios, even if those ratios are on different scales (i.e. do not have the same denominator). (For this reason, it is often used to compute financial & other indexes / indices. )
  • There are downsides: meaningful scales & units can be lost when applying the geometric mean, and its insensitivity to outliers can obscure large values that may be consequential. Further, it can produce results at odds with the arithmetic mean on values transformed to a single scale.

As with most things in life, there are few ironclad rules for applying the geometric mean (outside of compound interest & such things). There are some heuristics & rules of thumb, but ultimately judgement & scientific skepticism are required, as ever, for sound empiricism.

More on this in the conclusion below, but for now let’s introduce our final Pythagorean mean

The Harmonic Mean

The 3rd & final Pythagorean mean.

This section will be shorter than the last as the harmonic mean is yet more esoteric than the geometric mean, but still worth understanding.

Whereas the arithmetic mean requires addition & the geometric mean employs multiplication, the harmonic mean utilizes reciprocals.

As you may remember, the reciprocal of a number n is simply 1 / n. (e.g. the reciprocal of 5 is 1/5). For numbers that are already fractions, this means that you can simply “flip” the numerator & denominator: reciprocal of 4/5 = 5/4. This is true because 1 divided by a fraction yields that fraction’s reciprocal, e.g. 1 ÷ (4/5) = 5/4.

Another way to think about reciprocals is: two numbers that equal 1 when multiplied together. So when finding the reciprocal of a number n, we are simply asking: what number must we multiply with n in order to get 1. (This is why the reciprocal is also sometimes called the multiplicative inverse.)

So then, the harmonic mean can be described in words as: the reciprocal of the arithmetic mean of the reciprocals of the dataset.

Thats a lot of reciprocal flips there, but it’s actually just a few simple steps:

1. Take the reciprocal of all numbers in the dataset
2. Find the arithmetic mean of those reciprocals
3. Take the reciprocal of that number

In math notation, this looks like:

A simple example from wikipedia: the harmonic mean of 1, 4, and 4 is 2:

note: the notation “n-¹” is one way of symbolizing “the reciprocal of n

Notice, what we are saying here is: if the reciprocal of every number in our dataset was the same number, what number would it have to be in order to have the same reciprocal sum as our actual dataset?

(Note: due to the fact that 0 has no reciprocal (nothing can be multiplied by 0 to = 1), the harmonic mean also cannot handle datasets containing 0’s, similar to the geometric mean.)

So that’s how the plumbing works. But what is it good for?

Real World Applications of the Harmonic Mean

To answer this, we have to answer: what are reciprocals good for?

Since reciprocals, like all division, are just multiplication in disguise (which is just addition in disguise), we realize: reciprocals help us more easily divide by fractions.

For instance, what is 5 ÷ 3/7? If you remember elementary school maths, you’ll probably just multiply 5 by 7/3 (the reciprocal of 3/7) to solve this:

5 ÷ 3/7 = 5/1 * 7/3 = 35/3 = 11 2/3 = 11.66667

But an equivalent method would be to scale the numbers 5 & 3/7 to a common denominator, then divide in the normal way:

5/1 ÷ 3/7 = 35/7 ÷ 3/7 = 35 ÷ 3 = 11 2/3 = 11.66667

So again, similar to using the geometric mean as a counterpart to the arithmetic mean for multiplicative or nonlinear relationships (see above), the harmonic mean helps us find multiplicative / divisory relationships between fractions without worrying over common denominators.

As such, the harmonic mean naturally accommodates another layer of multiplication / division over the geometric mean. Thus it is helpful when dealing with datasets of rates or ratios (i.e. fractions) over different lengths or periods.

SIDEBAR: (You might be thinking: “wait, I thought the geometric mean was used for averaging interest rates & ratios on different scales!” And you would be correct. You also wouldn’t be the first to be confused by this. I myself set out to write this piece to clarify my own thinking & understanding here. So bear with me, I hope to make this clearer with the following example & recap all these differences in the conclusion of this article below.)

Average Rate of Travel

The canonical example of using harmonic means in the real world involves traveling over physical space at different rates, i.e. speeds:

Consider a trip to the grocery store & back:

  • On the way there you drove 30 mph the entire way
  • On the way back traffic was crawling, & you drove 10 mph the entire way
  • You took the same route, & covered the same amount of ground (5 miles) each way.

What was your average speed across this entire trip’s duration?

Again, we might naively apply the arithmetic mean to 30 mph & 10 mph, and proudly declare “20 mph!”

But consider again: because you travelled faster in one direction, you covered those 5 miles quicker & spent less time overall traveling at that speed, so your average rate of travel across your entire trip’s duration is not the middle point between 30 mph & 10 mph, it should be closer to 10 mph because you spent longer traveling at that speed.

In order to apply the arithmetic mean correctly here, we’d have to determine the amount of time spent traveling at each rate, then weight our arithmetic mean calculation appropriately:

Trip There: (at 30 mph)
30 miles per 60 mins = 1 mile every 2 minutes = 1/2 mile every minute
5 miles at 1/2 mile per minute = 5 ÷ 1/2 = 10 minutes
"Trip There" time = 10 minutes

Trip Back: (at 10 mph)
10 miles per 60 mins = 1 mile every 6 minutes = 1/6 miles every minute
5 miles at 1/6 mile per minute = 5 ÷ 1/6 = 30 minutes
"Trip Back" time = 30 minutes

Total trip time = 10 + 30 = 40 minutes

“Trip There” % of total trip = 10 / 40 minutes = .25 = 25%
“Trip Back” % of total trip = 30 / 40 minutes = .75 = 75%

Weighted Arithmetic Mean = (30mph * .25)+(10mph * .75) = 7.5 + 7.5 = 15
Average rate of travel = 15 mph

So we see that our true average rate of travel was 15 mph, which is 5 mph (or 25%) lower than our naive declaration of 20 mph using an unweighted arithmetic mean.

You can probably guess where this is headed…

Let’s try it again using the harmonic mean.

Harmonic mean of 30 and 10 = ...
Arithmetic mean of reciprocals = 1/30 + 1/10 = 4/30 ÷ 2 = 4/60 = 1/15
Reciprocal of arithmetic mean = 1 ÷ 1/15 = 15/1 = 15

Et voilà!²

Our true average rate of travel, automagically adjusted for time spent traveling in each direction = 15 mph!

Note a few things:

  • This only works because the total distance travelled was the same each way. If it was different, we’d have to use a weighted harmonic mean, or another weighted arithmetic mean.
  • ^ For the arithmetic mean we’d again weight by time spent traveling at each speed, while for the harmonic mean we’d weight by distance travelled (because it already accounts for the proportions of time implicit in the rates, by taking their reciprocals).
  • ^ Much of the trickiness & spookiness of Pythagorean Means comes down to the nature of ratios & which side of the ratio we’re more interested in. For instance, the arithmetic mean is always expressed in terms of the denominator. In the case of rate of travel, the ratio is miles-per-hour, so the arithmetic mean gives us a result in terms of its (somewhat hidden) denominator, hours: (30m / 1hr)+(10m / 1hr) ÷ 2 = 20m/1hr = 20 mph. This would be accurate if we spent an equal amount of time traveling in each direction, which we know is false. The harmonic mean instead flips these ratios by taking their reciprocals, putting our actual numbers of interest in the denominator, then takes the arithmetic mean, flips it again, & gives us the answer we’re looking for in terms of average speed, proportionate to the time spent at that speed. (For deeper discussion using financial P/E ratios, see this paper.)
  • The reason the geometric mean worked for our compound interest example above is that the rates were accruing over equivalent periods: one year each. If the periods varied, i.e. different lengths of time spent accruing interest at each rate, we’d have to again use weights of some sort.
  • ^ While the geometric mean handles multiplicative relationships such as rates applied to a principal investment & ratios on different scoring scales, the harmonic mean takes this one step further, easily accommodating another layer of multiplicative / divisory relationships such as varying periods or lengths, via the magic of reciprocals.

Like the case of compound interest and the geometric mean, this is an example of a precise, objectively correct application of the harmonic mean. But again, things aren’t always so clear. There are other precise, mathematically justifiable applications in physics, finance, hydrology & even (by convention) in baseball statistics. More germane to data science: it is often applied to precision & recall in the evaluation of machine learning models.

But more often than this, it’s a judgement call, dependent on a nimble understanding of your data & the task at hand.

I’ll try to clarify & summarize the finer points below.

PART I CONCLUSION

Back to where we started: A geometric construction of the Pythagorean means (of two numbers a and b)

To recap & make explicit what we’ve already demonstrated:

1. The three Pythagorean Means are intimately related, & can each be expressed as a special case of each other.

For instance, we saw that:

  • The geometric mean of scores on different scales ̵ ̵i̵̶̵s̵̶̵ ̵̶̵p̵̶̵r̵̶̵o̵̶̵p̵̶̵o̵̶̵r̵̶̵t̵̶̵i̵̶̵o̵̶̵n̵̶̵a̵̶̵t̵̶̵e̵̶̵ ̵̶̵t̵̶̵o̵̶̵ can sometimes preserve ordering of the arithmetic mean when those values are normalized to a common scale
  • The harmonic mean is equivalent to the weighted arithmetic mean of rate of travel (where values are weighted by relative time spent traveling)

In Part II (a separate post to follow), we’ll see what should be clear to those already familiar with multiplicative transformations: the geometric mean of a dataset is equivalent to the arithmetic mean of the logarithms of each number in that dataset. So just as the harmonic mean is simply the arithmetic mean with a few reciprocal transformations, the geometric mean is just the arithmetic mean with a log transformation.

If each mean is just a transformation or reformulation of the other, how do these transformations interact & affect your results?

2. The Pythagorean Means conform to a strict ordinal relationship.

Due to their respective equations: the harmonic mean is always smaller than the geometric mean, which is always smaller than the arithmetic mean.

The three means are closer together or farther apart depending on the spread of the underlying data. The only exception to this rule occurs in the extreme case when all numbers in the dataset are the same exact number, in which case all 3 means are also equivalent. Thus, the following inequality holds:

harmonic mean geometric mean arithmetic mean

These proportions can be observed in the geometric depiction of the Pythagorean (+ quadratic) Means at the beginning of this section.

Recognizing this relationship helps immensely in understanding when to apply each mean, & what the impact to your results will be.

To make this more concrete, lets revisit our original additive & multiplicative datasets, with all three means depicted in each:

Additive dataset {1, 4, 7, 10, 13, 16, 19…}

Harmonic mean = 4.3
Geometric mean = 7.3
Arithmetic mean = 10

Clearly, the geometric & harmonic means seem to substantially understate the ‘middle’ of this linear, additive dataset. This is because those means are more sensitive to smaller numbers than larger numbers (making them also relatively insensitive to large outliers).

Multiplicative dataset {1, 3, 9, 27, 81, 243, 729…}

Harmonic mean = 4.7
Geometric mean = 27
Arithmetic mean = 156.1

Here, the geometric mean sits precisely in the ordinal middle of the dataset, while the harmonic mean still skews to the low side & the arithmetic mean skews hard to the high side, pulled by large outliers.

It’s not trivial to depict a dataset where the central tendency is well-described by the harmonic mean, so I’m just going to move on…

3. There are some hard rules, some heuristics & a lot of room for judgement

  • To average ratios on different scales: use the geometric mean (or arithmetic mean over normalized scores)
  • To average compound rate changes over consistent periods: use the geometric mean
  • To average rates over different periods or lengths: use the harmonic mean (or weighted arithmetic mean)
  • Know which side of your ratio you are more interested in, & which mean to apply. The arithmetic mean is expressed in terms of the denominator, whether or not it is visible. The harmonic mean allows you to invert the ratio to get an answer in terms of the original numerator.
  • If your data evinces an additive structure: the arithmetic mean is usually safe
  • If your data evinces a multiplicative structure and /or has large outliers: the geometric or harmonic mean might be more appropriate (as might the median)
  • There are pitfalls & tradeoffs to any decision:
    - loss of meaningful scale or units when using the geometric mean
    - datasets with 0’s cannot be used with the geometric or harmonic means, & datasets with negative numbers also rule out the geometric mean
    - lack of audience familiarity with the particular “average” in question when using geometric or harmonic mean
  • It can often be more practical & interpretable to:
    - just use the median in presence of large outliers
    - remove or cap outliers
    - use a weighted arithmetic mean or statistical transformations rather than esoteric pythagorean means
  • Although the R statistical computing language has built-in methods for matrix inversion & cubic spline interpolation, it has no native functions to compute a simple geometric or harmonic mean, which might give some indication of their rarity. (Google sheets & Excel, however, do have them)

If there is one TL;DR for this entire piece, it would be:

Understand the nature of your data & think carefully about the summary statistics you use to describe it — or risk being wrong ‘on average’.

Please comment below with your own use cases & experience with the lesser Pythagorean means (as well as any errata you might catch in this piece!).

Check out Part II of this post, which dispenses with the conceptual narratives in favor of a more concise, economical & technical treatment of the subject with real & simulated data, distributions, plots & accompanying R code.


Follow on twitter: @dnlmc
LinkedIn: linkedin.com/in/dnlmc
Github: https://github.com/dnlmc

CORRECTION 7/14/18: A previous version of this piece stated “The geometric mean of scores on different scales is proportionate to the arithmetic mean when those values are normalized to a common scale”. Mladen Fernežir pointed out that this is not true. Thanks for the correction!

--

--

Founder & Chief Scientist @ Coεmeta (coemeta.xyz) | formerly Associate Director of Analytics & Decision Science @ the Philadelphia Inquirer