
Intro
Have you ever used an object detection algorithm? If yes, then chances are high that you have already used the Non-maximum suppression algorithm. Maybe it was part of a deep learning model you used and you haven’t even noticed. Because even very sophisticated algorithms face the problem they recognize the same object multiple times.
Today, I want to show you how the Non-maximum suppression algorithm works and provide a python implementation.
I will start out by showing you, that bounding boxes are rectangle that surround a detected object in an image. Then I will introduce the code for Non-maximum suppression. This algorithm removes the redundant bounding boxes one by one. It does so by removing boxes with an overlap greater than a threshold which we set manually.
Bounding boxes
We use bounding boxes to mark the part of an image where an object of interest has been recognized.

In this example, the object to recognize was the big diamond in the ace of diamonds.
Bounding Boxes are always upright rectangles. We therefore only need to store the top left and the bottom right corner of all bounding boxes.

When using object detection methods it happens often, that the same object get’s detected multiple times in slightly different areas.

Most of the time, we want to detect an object only once. To achieve this, we remove the redundant bounding boxes by applying non-maximum suppression.
#
I present you now the fully functional code to perform non-maximum suppression, so that you have an overview. But don’t worry, I will walk you through the code.
The Non-maximum suppression (NMS) function takes in an array of boxes and overlap treshold with a default value of 0.4.

The array of boxes must be organized so that every row contains a different bounding box.

The overlap treshold determines the overlap in area two bounding boxes are allowed to have. If they overlap more, then one of the two will be discarded. An overlap treshold of 0.4 would mean, that two boxes are allowed to share 40% of their area.
The area of a rectangle is calculated by multiplying it’s width by it’s height. We add one to 𝑥2−𝑥1 and 𝑦2−𝑦1, because the bounding box has a pixel on the start as well as on the end coordinate.

We then create indices for all the boxes. Later, we will drop out one index after another until we have only indices corresponding to non-overlapping boxes.

In the loop, we iterate over all boxes. For each box, we check, if it’s overlap with any other box is greater than the treshold. If so, we drop out the index of that box from our indices list.

We have to create indices, that contain contain the indices of the boxes, but without the index of the box[i].

To calculate the overlap we first calculate the coordinates of the intersection boxes. This code is vectorized to make it faster and therefore we calculate the intersection of the box[i] with every other box.

It might be a little bit confusing but the zero point is in the top left corner. Therefore we get the coordinates of the intersection box by selecting the minimum of 𝑥1 and 𝑦1 of two boxes and the maximum of 𝑥2 and 𝑦2 of the same boxes.

Then we calculate the width and the height of the intersection boxes. We take the maximum of 0 and our calculated widths and heights, because negative widths and heights would mess up the calculation of the overlap.

The overlap is then simply the area of the intersection boxes divided by the are of the bounding box. In our case all bounding boxes have the same size, but the algorithm also works with difference in sizes.

We then exclude the index i from our remaining indices, if the overlap of box[i] with any other box is greater than the treshold.

We then return the boxes with the indices that have not been dropped out. Pixel coordinates have to be integers, so we convert them just to be safe.

Object detection using template matching
You might ask yourself how I got those bounding boxes in the first place. I used a simple technique called template matching. You only need 1 image where you want to detect an object and 1 template, which is the object you want to search for.
Our image will be the ace of diamonds.

Our template will be the diamond in the middle of the image.

Note that the template must have roughly the same orientation and size (in pixels) as the object we want to detect in the image.
If you want to use my images, you can download them in the sources section.
We are going to need the opencv. If you don’t have it yet, you can install in the terminal.
pip install opencv-python
And we import it under the name cv2.
import cv2
To perform template matching and generate bounding boxes from it, we can use the following function.
The cv2.matchTemplate function return us the correlation of different parts of the image with the template.
We then select the parts of the image, where the correlation is above the treshold.

We also need a function to draw the bounding boxes onto the image.
Full code
Conclusion
We can use Non-maximum suppression to remove redundant bounding boxes. They are redundant in the sense that they mark the same object multiple times. The NMS algorithm calculates the overlap between triangles by making use of the area of the intersection triangle. If the overlap of a bounding box with any other bounding box is above the threshold, it will get removed.
Want to connect ?
Linkedin https://www.linkedin.com/in/vincent-m%C3%BCller-6b3542214/ Facebook https://www.facebook.com/profile.php?id=100072095823739 Twitter https://twitter.com/Vincent02770108 Medium https://medium.com/@Vincent.Mueller Become medium member and support me https://medium.com/@Vincent.Mueller/membership
Related articles by author
Other articles by author
Sources
pyimagesearch "(Faster) Non-Maximum Suppression in Python" Non-Maximum Suppression in Python)
LearnOpenCV "Non Maximum Suppression: Theory and Implementation in PyTorch"
Images


All images are provided by author. You are free to reuse them for any purpose, even commercially.