Model performance & cost functions for classification models

Sowmya Vivek
Towards Data Science
5 min readAug 8, 2018

--

A classification model is a machine learning model which predicts a Y variable which is categorical:

  1. Will the employ leave the organisation or stay?
  2. Does the patient have cancer or not?
  3. Does this customer fall into high risk, medium risk or low risk?
  4. Will the customer pay or default a loan?

A classification model in which the Y variable can take only 2 values is called a binary classifier.

Model performance for classification models is usually debatable in terms of which model performance is most relevant, especially when the dataset is imbalanced. The usual model performance measures for evaluating a classification model are accuracy, sensitivity or recall, specificity, precision, KS statistic and Area under the curve (AUC).

Let us understand some model performance measures based on an example of predicting loan default. A loan default dataset is a typical example of an imbalanced dataset where the two classes are Loan default Y and Loan default N. The number of loan defaulters is usually a very small fraction of the total dataset — not more than 7–8%. This provides a classical imbalanced dataset to understand why cost functions are critical is deciding on which model to use.

Before we delve deep into how to formulate a cost function, let us look at the fundamental concepts of a confusion matrix, false positives, false negatives and the definitions of various model performance measures.

What is a confusion matrix?

The confusion matrix is a table that contains the output of a binary classifier. Let us look at the confusion matrix of a binary classifier that predicts loan default — 0 indicates that the customer will pay the loan and 1 indicates that the customer will default. The positive class for our further discussions is 1 (a customer who will default).

The rows of the matrix indicate the observed or actual class and the columns indicate the predicted class.

Confusion matrix
Definition of TP, FP, TN, FN

Let us now try to understand what each model performance measure translates to in terms of the components of the confusion matrix.

Accuracy

Accuracy is the number of correct predictions made by the model by the total number of records. The best accuracy is 100% indicating that all the predictions are correct.

For an imbalanced dataset, accuracy is not a valid measure of model performance. For a dataset where the default rate is 5%, even if all the records are predicted as 0, the model will still have an accuracy of 95%. But this model will ignore all the defaults and can be very detrimental to the business. So accuracy is not a right measure for model performance in this scenario.

Sensitivity or recall

Sensitivity (Recall or True positive rate) is calculated as the number of correct positive predictions divided by the total number of positives. It is also called recall (REC) or true positive rate (TPR).

Sensitivity

Specificity

Specificity (true negative rate) is calculated as the number of correct negative predictions divided by the total number of negatives.

Specificity

Precision

Precision (Positive predictive value) is calculated as the number of correct positive predictions divided by the total number of positive predictions.

Precision

KS statistic

KS statistic is a measure of degree of separation between the positive and negative distributions. KS value of 100 indicates that the scores partition the records exactly such that one group contains all positives and the other contains all negatives. In practical situations, a KS value higher than 50% is desirable.

ROC chart & Area under the curve (AUC)

ROC chart is a plot of 1-specificity in the X axis and sensitivity in the Y axis. Area under the ROC curve is a measure of model performance. The AUC of a random classifier is 50% and that of a perfect classifier is 100%. For practical situations, an AUC of over 70% is desirable.

Precision vs recall

Recall or sensitivity gives us information about a model’s performance on false negatives (incorrect prediction of customers who will default), while precision gives us information of the model’s performance of false positives. Based on what is predicted, precision or recall might be more critical for a model.

The cost function comes into play in deciding which of the incorrect predictions can be more detrimental — the false positive or the false negative (in other words, which performance measure is important — precision or recall).

Net revenue function

The net revenue or cost function is derived by apportioning a cost for every false positive and false negative and arriving at the overall revenue based on the correct and incorrect predictions. Let us assume the following cost and revenue for this loan default dataset:

Cost components of a loan default prediction model
Allocation of FP, FN & TP cost
Net revenue

Since the false negative cost is the highest, the most optimal model will be the one with the minimum false negatives. In other words, a model with higher sensitivity will fetch a higher net revenue compared to other models.

Now that we have the method to calculate the net revenue, let us compare 2 models based on their confusion matrix:

Confusion Matrix A

Confusion matrix B

Model B, with its lower false negatives turns out to be a better model and hence can be chosen for predicting default.

To sum up, in a business scenario where a fairly approximate estimate of the various cost components are available, a model performance measure based on the cost function will give a better insight on model selection compared to conventional model performance measures like sensitivity, specificity, etc. It would also make more sense to the customer, if the model is explained in terms of cost & revenue compared to giving statistical terminology.

--

--