Taking the Confusion Out of Confusion Matrices

Published in

Towards Data Science

6 min readOct 10, 2018

When I first learned about the concept of a confusion matrix I was left with one overwhelming feeling: confusion. They are called confusion matrices, after all. In fact, I’m convinced that they are named as such because they are often so confusing (don’t quote me). But, when you dig deeper into the construction of a confusion matrix, it isn’t actually that confusing.

The most important thing to know is that it is based entirely upon the outcome.

True, False, Positive, Negative

How you should look at a confusion matrix

The Positive/Negative label refers to the predicted outcome of an experiment, while the True/False refers to the actual outcome. So if I predicted that someone was pregnant, but they weren’t, then that would be a False Positive because the actual outcome was false but the prediction was positive.

Types of Errors

Confusion matrices have two types of errors: Type I and Type II.

I was taught two ways to keep Type I and Type II straight. If you know of any others that have helped you over the years, please leave them in the comments — I love a good mnemonic!

The first way is to re-write False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)

The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.

Confusion Metrics

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
Misclassification (all incorrect / all) = FP + FN / TP + TN + FP + FN
Precision (true positives / predicted positives) = TP / TP + FP
Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN
Specificity (true negatives / all actual negatives) =TN / TN + FP

Example

I’ve thrown a lot of words and formulas at you at this point so let’s apply what we’ve learned to an example. I learn best by doing examples and when it comes to matrices, there is no better example than pregnancy.

Suppose I work for Target and I want to detect pregnant teenagers so I can inform their fathers before they can, based on shopping patterns. I take a random sample of 500 female, teenage customers. Of these teenagers, 50 are actually pregnant. I predicted 100 total pregnant teenagers, 45 of which are actually pregnant.

Our task is two-fold: A) Identify the TP, TN, FP, FN, and construct a confusion matrix and B) Calculate the accuracy, misclassification, precision, sensitivity, and specificity

First, let’s break down our problem statement to answer part A.

I take a random sample of 500 female, teenage customers. Of these teenagers, 50 actually are pregnant. I predicted 100 total pregnant teenagers, 45 of which are actually pregnant.

I predicted 100 pregnancies so our “predicted pregnant row” should add up to 100. We know that 45 of the 100 were indeed pregnant, so we can put 45 in the predicted pregnant actual pregnant spot, aka a True Positive.

Additionally, 50 people in my sample are actually pregnant. So my actual pregnant column should add up to 50. Since we already have 45 in this column, we put 5 in the predicted not actual pregnant spot, aka a False Negative.

I predicted 100 pregnancies, but only 45 of those were actually pregnant. So of all those that I predicted, how many did I falsely predict? The answer is 55, which is my False Positive because I falsely predicted the positive outcome.

Finally, I can determine the amount of True Negatives by adding 45, 55, and 5 together then subtracting from my total sample of 500. This leaves us with 395 True Negatives. Once our numbers are filled in, we can double-check ourselves by adding up all of our squares and ensuring they add up to 500.

We also know that 55 or False Positive is our Type I error because we know that I falsely predicted 55 pregnancies of my total 100 predicted pregnancies. We know that 5 or False Negative is our Type II error because we know that I falsely predicted 5 girls were not pregnant out of the 50 total actual pregnancies.

Next, we can use our labelled confusion matrix to calculate our metrics.

Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN

(45 + 395) / 500 = 440 / 500 = 0.88 or 88% Accuracy

2. Misclassification (all incorrect / all) = FP + FN / TP + TN + FP + FN

(55 + 5) / 500 = 60 / 500 = 0.12 or 12% Misclassification

You can also just do 1 — Accuracy, so:

1–0.88 = 0.12 or 12% Misclassification

3. Precision (true positives / predicted positives) = TP / TP + FP

45 / (45 + 55) = 45 / 100 = 0.45 or 45% Precision

4. Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN

45 / (45 + 5) = 45 / 50 = 0.90 or 90% Sensitivity

5. Specificity (true negatives / all actual negatives) =TN / TN + FP

395 / (395 + 55) = 395 / 450 = 0.88 or 88% Specificity

So What?

This is all fine and dandy, but why do we care?

We care because a confusion matrix can help us evaluate the performance of our models, using the metrics we calculated above. Now, we don’t need to do all five every single time — we just need to pick the one that will help us evaluate our model based on our worst case scenario.

In this instance, with our pregnancy example, a high incidence of false positives is the worst possible outcome. This makes precision the metric I would like to focus on, which is pretty terrible at just 45%. This means that we are falsely predicting pregnancies and informing parents of non-existent pregnancies 55% of the time. Our model, simply put, sucks.

I hope this article left you feeling a little less confused about confusion matrices. If you have any questions or methods for simplifying, leave them in the comments below.