Understanding Confusion Matrix

Image for post
Image for post

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that’s exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

This blog aims to answer following questions:

  1. What the confusion matrix is and why you need it?
  2. How to calculate Confusion Matrix for a 2-class classification problem?

Today, let’s understand the confusion matrix once and for all.

What is Confusion Matrix and why you need it?

Well, it is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

Image for post
Image for post

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.

Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.

Image for post
Image for post

True Positive:

Interpretation: You predicted positive and it’s true.

You predicted that a woman is pregnant and she actually is.

True Negative:

Interpretation: You predicted negative and it’s true.

You predicted that a man is not pregnant and he actually is not.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he actually is not.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she actually is.

Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.

Image for post
Image for post

How to Calculate Confusion Matrix for a 2-class classification problem?

Let’s understand confusion matrix through math.

Image for post
Image for post
Image for post
Image for post

Recall

Out of all the positive classes, how much we predicted correctly. It should be high as possible.

Precision

Image for post
Image for post

Out of all the positive classes we have predicted correctly, how many are actually positive.

and Accuracy will be

Out of all the classes, how much we predicted correctly, which will be, in this case 4/7. It should be high as possible.

F-measure

Image for post
Image for post

It is difficult to compare two models with low precision and high recall or vice versa. So to make them comparable, we use F-Score. F-score helps to measure Recall and Precision at the same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more.

I hope I’ve given you some basic understanding on what exactly is confusing matrix. If you like this post, a tad of extra motivation will be helpful by giving this post some claps 👏. I am always open for your questions and suggestions. You can share this on Facebook, Twitter, Linkedin, so someone in need might stumble upon this.

You can reach me at:

LinkedIn : https://www.linkedin.com/in/narkhedesarang/

Twitter : https://twitter.com/narkhede_sarang

Github : https://github.com/TheSarang

Thanks for Reading!

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Sign up for The Daily Pick

By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to Thursday. Make learning your daily ritual. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Sarang Narkhede

Written by

Community Organizer @GDGRochester. Live and breath ML. All views are my own. Graduate CS student at @RIT.

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Sarang Narkhede

Written by

Community Organizer @GDGRochester. Live and breath ML. All views are my own. Graduate CS student at @RIT.

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store