
Confusion Matrix
Confusion matrix (also called Error matrix) is used to analyze how well the Classification Models (like Logistic Regression, Decision Tree Classifier, etc.) performs. Why do we analyze the performance of the models? Analyzing the performance of the models helps us to find and eliminate the bias and variance problem if exist and it also helps us to fine-tune the model so that the model produces more accurate results. Confusion Matrix is usually applied to Binary classification problems but can be extended to Multi-class classification problems as well.
Terminologies of Confusion Matrix

Concepts are comprehended better when illustrated with examples so let us consider an example. Let us assume that a family went to test for COVID19.
True Positive (TP): True Positives are the cases that have been predicted as positive and they indeed have that disease.
False Positive (FP): False Positives are the cases that have been predicted as positive but they do not have that disease.
True Negative (TN): True Negatives are the cases that have been predicted as negative and they indeed do not have that disease.
False Negative (FN): False Negatives are the cases that have been predicted as negative but they have that disease.
Sensitivity: Sensitivity is also called as Recall and True Positive Rate. Sensitivity is the proportion of actual positives that are correctly predicted as positives. In other words, Sensitivity is the ratio of True Positives to the Sum of True Positives and False Negatives.

Specificity: Specificity is also called True Negative Rate. Specificity is the proportion of actual negatives that are correctly predicted as negatives. In other words, Specificity is the ratio of the True Negatives to the Sum of True Negatives and False Positives.

Precision: Precision is the proportion of predicted positives that are correctly predicted as positives. In other words, Precision is the ratio of the True Positives to the Sum of True Positives and False Positives.

F1 Score: F1 Score is defined as the harmonic mean of Precision and Recall. F1 Score scales from 0 to 1 with 0 as the worst score and 1 as the highest score. F1 Score can be used when the data is suffering from class imbalance since it considers both False Positives and False Negatives.

Accuracy: Accuracy of a model is defined as the ratio of Sum of True Positives and True Negatives to the Total Number of Predictions. Accuracy scales from 0 to 100. Accuracy can be used when obtaining True Positives and True Negatives are imperative.

ROC- AUC Curve
ROC-AUC curve (Receiving Operator Characteristics- Area Under Curve) helps to analyze the performance of classification at various threshold settings. High True Positive Rates (TPR/ Sensitivity) of a class describes that the model has performed well in classifying that particular class. ROC-AUC curves can be compared for various models and the model that possesses high AUC (Area Under Curve) is considered to have performed well. In other words, the model has performed significantly well producing high TPR (True Positive Rate) at various threshold settings.

Cost functions for Classification
Cost functions assist in measuring how well a model performs by considering actual values and predicted values.
Cross-Entropy loss
Cross-Entropy loss is also called as Log loss. Log loss can be applied to Binary classification problems where the targets are binary and to Multi-class classification problems as well. Let us consider C to be the number of classes in the target variable.
If C = 2 (binary classification) the log loss or binary cross-entropy loss is calculated as follows,
- When the actual value y = 0, [(1-y) * log(1- 𝑦̂)] is applied where 𝑦̂ is the prediction of y.
- When the actual value y = 1, [y * log(𝑦̂)] is applied 𝑦̂ is the prediction of y.

*Graph for -y log(𝑦̂) when y = 1 (y is the actual value)**

*Graph for -[(1- y) log(1- 𝑦̂)] when y = 0**

If C > 2 (multi-class classification) the log loss or multi-class cross-entropy loss is calculated as follows,

Multi-Class Cross-Entropy Loss is defined for a single instance of data and Multi-Class Cross-Entropy Error is defined for the entire set of instances of data.
Sparse Multi-class Cross-Entropy Loss
Sparse Multi-class Cross-Entropy Loss is very similar to Multi-class Cross-Entropy loss except for the representation of the true labels that differ from the Multi-class Cross-Entropy Loss.

In Multi-class Cross-Entropy loss, the true labels are one-hot encoded whereas, in Sparse Multi-class Cross-Entropy loss, the true labels are left as such thereby reducing the computation time.
Representation of true labels y in Multi-class Cross-Entropy loss,

Representation of true labels y in Multi-class Sparse Cross-Entropy loss,

Summary
- F1 Score can be used when the dataset is imbalanced. A dataset is said to be imbalanced when the number of samples of one class is more than the number of samples of another class.
- ROC-AUC curve is plotted using True Positive Rates and False Positive Rates at different threshold settings. ROC-AUC curve helps to find the optimal threshold for classification.
- Cross-Entropy loss can be applied to both binary and multi-class classification problems.
- Sparse Multi-class Cross-Entropy Loss is computationally faster than Multi-class Cross-Entropy Loss.
References
[1] Jason Brownlee, How to Choose Loss Functions When Training Deep Learning Neural Networks.
[2] Scikit-learn, Receiving Operator Characteristics.
[3] ML- cheatsheet, Loss functions – ML glossary documentation.
Connect with me on LinkedIn, Twitter!
Happy Machine Learning!