
After training a machine learning classifier, the next step is to evaluate its performance using relevant metric(s). The Confusion Matrix is one of the evaluation metrics.
A confusion matrix is a table showing the performance of a classifier given some truth values/instances (supervised learning kind of).
But calculating of confusion matrix for object detection and instance segmentation tasks is less intuitive. First, it is necessary to understand another supporting metric: Intersection over Union (IoU). A key role in calculating metrics for object detection and instance segmentation tasks is played by Intersection over Union (IoU).
Intersection over Union (IoU)
IoU, also called Jaccard index, is a metric that evaluates the overlap between the ground-truth mask (gt) and the predicted mask (pd). In object detection, we can use IoU to determine if a given detection is valid or not.
IoU is calculated as the area of overlap/intersection between gt and pd divided by the area of the union between the two, that is,

Diagrammatically, IoU is defined as shown below:

Note: IoU metric ranges from 0 and 1 with 0 signifying no overlap and 1 implying a perfect overlap between gt and pd.
A confusion matrix is made up of 4 components, namely, True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). To define all the components, we need to define some threshold (say α) based on IoU.

- True Positive (TP) – This is an instance in which the classifier predicted positive when the truth is indeed positive, that is, a detection for which IoU ≥ α.
- False Positive (FP) – This is a wrong positive detection, that is, a detection for which IoU < α.
- False Negative (FN) – This is an actual instance that is not detected by the classifier.
- True Negative (TN) – This metric implies a negative detection given that the actual instance is also negative. In object detection, this metric does not apply because there exist many possible predictions that should not be detected in an image. Thus, TN includes all possible wrong detection that were not detected.
These concepts can intuitively be understood with some diagrammatic examples (let’s consider the IOU threshold, α = 0.5)




Remark : By the definition of IoU threshold, Fig 3 RIGHT turns to be FP if we choose threshold above 0.86 and Fig 4 RIGHT becomes a TP if we choose IoU threshold below 0.14
Other metrics that can be derived from confusion matrix includes:
- Precision is the ability of a classifier to identify only relevant objects. It is the proportion of correct positive predictions and is given by

- Recall is a metric which measures the ability of a classifier to find all the relevant cases (that is, all the ground-truths). It is the proportion of true positive detected among all ground-truths and is defined as

_F_₁ score is harmonic mean of precision and recall.

Example
Consider the following image with the ground truths (dark blue) and classifier detections (red). Through observation can you be able to tell the number of TP, FP and FN?

Python Implementation
In Python, a confusion matrix can be calculated using Shapely library. The following function ( evaluation(ground,pred,iou_value)
→ 6-value tuple for TP, FP, FN, Precision, Recall, F₁) can be used to determine confusion matrix for above image (Fig 5)
Parameters:
- ground – is n × m × 2 array where n is number of the ground truth instances for the given image, m is the number of (x,y) pairs sampled on the circumference of the mask.
- pred is p × q × 2 array where p is the number of detections, and q is the number of (x,y) points sampled for the prediction mask
-
iou_value is the IoU threshold
For Fig 5 and IoU threshold, α = 0.5,
evaluation(ground,pred,iou_value)
→
TP: 9 FP: 5 FN: 0 GT: 10
Precall: 0.643 Recall: 1.0 F1 score: 0.783
Thank you for reading 🙂
Join medium on https://medium.com/@kiprono_65591/membership to get full access to every story on Medium.
You can also get the articles into your email inbox whenever I post using this link: https://medium.com/subscribe/@kiprono_65591
References
Jordi Gene-Mola, Ricardo Sanz-Cortiella, Joan R. Rosell-Polo, Josep-Ramon Morros, Javier Ruiz-Hidalgo, Verónica Vilaplana, & Eduard Gregorio. (2020). Fuji-SfM dataset [Data set]. Zenodo. http://doi.org/10.528/zenodo.3712808
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., and Zisserman, A. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1):98–136, 2015.
Fawcett, T. An introduction to ROC analysis. Pattern recognition letters, 27(8):861–874, 2006.