In this new article of this "Ace your Machine Learning Interview" series, I begin to address a different topic. Instead of describing a new Machine Learning algorithm, I will show you how to evaluate whether an algorithm is working well. Knowing how to evaluate an ML system well is as important as developing the system itself.
I hope that you will find it useful as you tackle your next interview! 😁
Introduction
Knowing how to evaluate a Machine Learning algorithm is of tremendous importance. How can we improve our algorithm if we cannot tell whether its output is good or not? There are many metrics, and sometimes it can be difficult to navigate through them. You may have heard acronyms like ROC or AUC or you may have heard of Precision and Recall and other metrics. But what are they, and how should they be used appropriately?
These various metrics that are used to evaluate an ML algorithm can be derived from the Confusion Matrix, which is a kind of "summary" of how well the algorithm predicted the labels in our dataset.
In one of the last interviews I went through in the area of Computer Vision in particular, I was just asked to explain what confusion matrix is, and how to implement it in a practical exercise using Python. And so I decided to write this article to share this experience with you.
I would also like to give you as a suggestion to revisit the basics of Machine Learning well when you face an interview. Often the companies that are hiring are much more interested in knowing that you can master the main concepts of Machine Learning rather than knowing what the latest and most esoteric neural network architectures are.
So I want to talk about how to read a confusion matrix and how you can implement it with a few lines of code using Python.
Confusion Matrix
Let’s start with the simplest case. Suppose you are training any classifier: it can be a classifier based on Logistic Regression or Naive Bayes or any other algorithm you want.
The classifier given an input simply has to say whether that input is positive or negative (dog or cat is the same thing). First of all, what you want to do is to figure out how many times this classifier got the output wrong and what were the points at which it guessed right.
We can then start counting all the times he was wrong. That is, all the times it should have predicted positive but predicted negative, and all the times it should have predicted negative but predicted positive. Then we are also interested instead has how many times the algorithm gave the correct output. That is, how many times it predicted positive and negative correctly.
To avoid confusion simply summarize all these values in a table that is called the Confusion Matrix. The rows represent the actual labels and the columns represent the predicted labels.
True Positive (TP): The classifier predicted positive correctly. Example: the classifier predicted that the patient has a tumor and it is true!
True Negative (TN): The classifier predicted negative correctly. Example: the classifier predicted that the patient does not have a tumour and in fact the patient is healthy.
False Positive (FP): Also called Type 1 Error. The classifier predicted positive but the label to predict was negative. Example: The classifier predicted that the patient has cancer but is actually healthy.
False Negative (FN): Also called Type 2 Error. The classifier predicted negative but the label to be predicted was positive. Example: the classifier predicted that the patient does not have a tumour when he does!
Let’s look at an example of how to calculate a confusion matrix using sklearn in Python.
The following Iris dataset is provided by sklearn under an open license, and it can be found here.
First, we are going to download the data and split it as usual into train and test sets, by using sklearn functions.
from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
x = iris.data
y = iris.target
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.3, stratify = y)
Now take the classifier you prefer, and train it on the train data. I will use a simple Logistic Regression in this case.
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(x_train, y_train)
Well now that we have trained our algorithm all that remains is to make predictions about the test set.
predictions = lr.predict(x_test)
Now we have everything we need to compute the confusion matrix and thus get a clearer idea of the errors made by the algorithm. To compute the confusion matrix we only need to implement these two lines of code.
from sklearn.metrics import confusion_matrix
cf = confusion_matrix(y_test, predictions)
print(cf)
That’s it: you now have your confusion matrix at your disposal. If I can give you one more piece of advice, in my opinion, the confusion matrix displayed this way is still not very readable. I often prefer to display it as a heatmap, so that I have bright colours when the number is high. That way with a single glance I can get an impression of the progress of my algorithm. To do this I use the Seaborn library.
import seaborn as sns
sns.heatmap(confusion_matrix(y_test, predictions))
Much better I would say! Since most of the predictions made are correct we got the lightest colours on the diagonal.
Final Thoughts
Knowing how to evaluate your Machine Learning algorithm is of primary importance. And in order to do this knowing how to calculate and what the confusion matrix represents is definitely the first step. I’ve often had to answer various questions about the confusion matrix and the derived metrics in different Machine Learning interviews I’ve been through. So I hope you find this article useful to learn about this topic or to refresh your memory.😁
In case you are interested in the previous articles in this series, I leave the links here:
- Ace your Machine Learning Interview – Part 1: Dive into Linear, Lasso and Ridge Regression and their assumptions
- Ace your Machine Learning Interview – Part 2: Dive into Logistic Regression for classification problems using Python
- Ace your Machine Learning Interview – Part 3: Dive into Naive Bayes Classifier using Python
- Ace your Machine Learning Interview – Part 4: Dive into Support Vector Machines using Python
- Ace your Machine Learning Interview – Part 5: Dive into Kernel Support Vector Machines using Python
- Ace your Machine Learning Interview – Part 6: Dive into Decision Trees using Python
- Ace your Machine Learning Interview – Part 7: Dive into Ensemble Learning with Hard Voting Classifiers using Python
- Ace your Machine Learning Interview – Part 8: Dive into Ensemble Learning with AdaBoost from scratch using Python
The End
Marcello Politi