The world’s leading publication for data science, AI, and ML professionals.

Problems With Graphing Percentages

Sometimes you need to know when to break the rules

Image by author
Image by author

When creating graphs with a numeric y-axis, we are often given the rule to "start with zero". But as with all rules, we need to know when to keep that rule and when to break it. Given that, I’d like to talk about plotting percentages.

I recently took a class on Data visualization and would often hear some classmates mention this very rule (maybe not in those exact words) like it’s gospel. When I would respond, I would typically say something like, "but percentages are weird."

My background is in chemistry, so one of the first things I think of when discussing percentages is purity. When talking about chemical purity, we routinely report the percent of the thing that we’re most interested in. So a chemical synthesis may have a 90% purity if we’re lucky, or a 30% purity if we’re not so lucky. The difference between those two numbers is obvious. What’s not so obvious is how different 99.0% and 99.9% purity are. It might seem like those numbers are very close to each other, and that the difference is negligible. But is that really true?


Let’s say that we run a reaction six times, with slightly different reaction conditions, and get purities ranging from 99.0–99.9%. If we were to plot that "starting with zero", we’d get a plot like this:

Image by author
Image by author

From this graph, it looks like there’s almost no difference between the different reactions. The thing is, if we look at the other side of things (the contaminants), going from 99.0% purity to 99.9% purity means that contaminants have dropped from 1.0% to 0.1%, which means our contaminants have dropped by an order of magnitude. If the contaminant is water, that may not be such a big deal, but if the contaminant is arsenic, then it could be huge. Now, if we know that there is only one contaminant, we might create a graph of contaminant concentrations, like this:

Image by author
Image by author

But what if the contamination is from a mixture of things. Well, you could just plot total contaminants like the graph above, but if you want to focus on the purity, then it’s you’ll probably want to just create the first plot. But, if we want to show the difference between reactions, then we need to adjust the scale in some way. In this case, we throw out the "start with zero" rule and start with something that is more reasonable. Your problem at this point is in deciding on the best scale. This is the tough part. You want to show the differences, but you don’t want to get too extreme about it. Here are three options.

Images by author
Images by author

Which would you choose? What story are you trying to tell? Just be careful not to oversell your idea.

As I said at the beginning, percentages are weird – and tricky.


Take a look at my previous articles.

Use Rattle to Help You Learn R

Creating a Custom R Package


Related Articles