What is Average Precision in Object Detection & Localization Algorithms and how to calculate it?

A step-by-step visual guide to understanding the mean average precision for object detection and localization algorithms

Published in

Towards Data Science

7 min readMay 13, 2022

What is Object Detection and Localization?

Object-detection-and-localization is among the fastest evolving areas of machine learning. Such an algorithm is an extension of the standard classification algorithm. For a given input image, a classification algorithm would output a probability distribution of interested classes. aims to detect not only the presence of interested classes in an image but also localize them using bounding boxes. Moreover, it can also handle the presence of multiple classes in the same image.

Consider the figure below that compares a classification algorithm to object detection and localization.

Difference between classification and detection&localization algorithm — Image by Author modified from Photo by Steve Tsang on Unsplash, Photo by Priscilla Du Preez on Unsplash

So, it not only predicts the class label but also tells us where in the picture the predicted class is. Hence, to evaluate the performance of the object detection and localization algorithm, we need to evaluate if the predicted class is the actual class and how close the predicted bounding box is to the ground truth.

Evaluation Metrics

The performance of the object detection and localization algorithm is evaluated by a metric called Average Precision (AP) (and mean average precision). Before we get into the detail of what AP is, let's make one thing clear about what it is NOT.

AP is NOT the average of precision across the different classes.

AP is calculated with the help of several other metrics such as IoU, confusion matrix (TP, FP, FN), precision and recall, etc as shown in the figure below.

To understand AP, we first need to understand these metrics.

1. Intersection over Union (IoU):

IoU quantifies the closeness of the two bounding boxes (ground truth and prediction). It's a value between 0 and 1. If the two bounding boxes overlap completely, then the prediction is perfect and hence the IoU is 1. On the other hand, if the two bounding boxes don’t overlap, the IoU is 0. The IoU is calculated by taking the ratio between the area of intersection and the area of the union of two bonding boxes as shown below.

Intersection over Union — Image by Author

2. True Positive, False Positive, False Negative:

A prediction is said to be correct if the class label of the predicted bounding box and the ground truth bounding box is the same and the IoU between them is greater than a threshold value.

Based on the IoU, threshold, and the class labels of the ground truth and the predicted bounding boxes, we calculate the following three metrics

True Positive: The model predicted that a bounding box exists at a certain position (positive) and it was correct (true)
False Positive: The model predicted that a bounding box exists at a particular position (positive) but it was wrong (false)
False Negative: The model did not predict a bounding box at a certain position (negative) and it was wrong (false) i.e. a ground truth bounding box existed at that position.
True Negative: The model did not predict a bounding box (negative) and it was correct (true). This corresponds to the background, the area without bounding boxes, and is not used to calculate the final metrics.

The following example will help clarify TP, FP, and FN.

3. Precision, Recall

Based on the TP, FP, and FN, for each labeled class, we calculate two parameters: precision and recall.

Precision: tells us how precise our model is i.e. out of total detected say cats, how many were actual cats. Hence, it is the ratio between the true positive and the total number of cat predictions (equivalently the sum of true positive and false positive) made by the model as shown below.
Recall: Tells us how good the model is at recalling classes from images i.e. out of total cats in the input image how many was the model able to detect. Hence, it is the ratio between the true positive and the total number of ground truth cats (equivalently the sum of true positive and false negative) made by the model as shown below.

Precision and Recall in Machine Learning — Image by Author

From the figure above it can be seen that the classifier is precise in what it predicts. When it says it is a cat (dog), it is correct 80% of the time. However, if there is a cat (dog) in an image the classifier can only detect it 50% (80%) of the time. Hence the model has a hard time recalling cats.

4. Precision-Recall Curve

Ideally, we want both the precision and recall to be high i.e whatever is detected is correct and the model can detect all the occurrences of a class. The value of precision and recall depends on how many true positives were detected by the model. Assigning a bounding box TP, FP, and FN depends on the following two things

The predicted label compared to the ground truth label
The IoU between the two boxes

For a multiclass classification problem, the model outputs the conditional probability that the bounding box belongs to a certain class. The greater the probability for a class, the more chances the bounding box contains that class. The probability distribution along with a user-defined threshold (between 0 to 1) value is used to classify a bounding box.

The smaller this probability confidence threshold, the higher the number of detections made by the model, and the lower the chances that the ground-truth labels were missed and hence higher the recall (Generally, but not always). On the other hand, the higher the confidence threshold, the more confident the model is in what it predicts and hence higher the precision (Generally, but not always). We want both the precision and recall to be as high as possible, hence, there exists a tradeoff between precision and recall based on the value of the confidence threshold.

A precision-recall curve plots the value of precision against recall for different confidence threshold values.

With the precision-recall curve, we can see visually what confidence threshold is best for us (for our given application). An overly simplified example of the PR curve can be seen below

Precision-Recall Curve — Image by Author

5. Average Precision

Selecting a confidence value for your application can be hard and subjective. Average precision is a key performance indicator that tries to remove the dependency of selecting one confidence threshold value and is defined by

Average precision is the area under the PR curve.

AP summarizes the PR Curve to one scalar value. Average precision is high when both precision and recall are high, and low when either of them is low across a range of confidence threshold values. The range for AP is between 0 to 1.

Average Precision is the area under the Precision-Recall curve — Image by Author

The following two approaches are usually used to find the area under the PR curve.

Approach 1 — Approximate the PR curve with rectangles:

For each precision-recall pair (j=0, …, n-1), the area under the PR curve can be found by approximating the curve using rectangles.
The width of such rectangles can be found by taking the difference of two consecutive recall values (r(k), r(k-1)), and the height can be found by taking the maximum value of the precision for the selected recall values i.e. w = r(k)-r(k-1), h = max(p(k), p(k-1))
AP can be calculated by the sum of the areas of these rectangles as shown below

Calculating Average Precision from PR curve — Image By Author

Approach 2 — Interpolation and 11-point average

The precision values for the 11 recall values from 0.0 to 1.0 with an increment of 0.1 are calculated
These 11 points can be seen as orange samples in the figure on the right
AP can be calculated by taking the mean of these 11 precision values as shown below

6. Mean Average Precision:

AP value can be calculated for each class. The mean average precision is calculated by taking the average of AP across all the classes under consideration. i.e

Mean Average Precision — The mean of Average Precision (AP) across all the k classes — Image by Author

Summary

Mean average precision (mAP) quantifies the performance of object detection and localization algorithm. In order to understand mAP, we need to understand what IoU, True Positive, True Positive, False Positive, False Negative, Recall, Precision, and the precision-recall curve are. In this article, we went through each of these concepts and how they help us calculate the mAP.

If this article was helpful to you or you want to learn more about Machine Learning and Data Science, follow Aqeel Anwar, or connect with me on LinkedIn or Twitter. You can also subscribe to my mailing list.