Object Detection using YoloV3 and OpenCV

An Introduction to Object Detection with YoloV3 for beginners

Nandini Bansal
Towards Data Science

--

Object Detection using YoloV3 and OpenCV

Computer Vision has always been a topic of fascination for me. In layman's terms, computer vision is all about replicating the complexity of the human vision and his understanding of his surroundings. It is emerging to be one of the most powerful fields of application of AI. Thanks to the amount of data that is being generated on an everyday basis.

Object Detection

When we look at images or videos, we can easily locate and identify the objects of our interest within moments. Passing on of this intelligence to computers is nothing but object detection — locating the object and identifying it. Object Detection has found its application in a wide variety of domains such as video surveillance, image retrieval systems, autonomous driving vehicles and many more. Various algorithms can be used for object detection but we will be focusing on YoloV3 algorithm.

YoloV3 Algorithm

You Only Look Once or more popularly known as YOLO is one of the fastest real-time object detection algorithm (45 frames per second) as compared to the R-CNN family (R-CNN, Fast R-CNN, Faster R-CNN, etc.)

The R-CNN family of algorithms uses regions to localise the objects in images which means the model is applied to multiple regions and high scoring regions of the image are considered as object detected. But YOLO follows a completely different approach. Instead of selecting some regions, it applies a neural network to the entire image to predict bounding boxes and their probabilities.

We have two options to get started with object detection:

  1. Using the pre-trained model
  2. Training custom object detector from scratch

In this article, we will be looking at creating an object detector using the pre-trained model for images, videos and real-time webcam. In case you wish to train a custom YOLO object detector, I would suggest you head to Object Detection with YOLO: Hands-on Tutorial. The author has covered all the steps starting from data annotations for custom object detectors to processing it and finally, training the model.

Let us dive into the code.

Let us start with importing the modules that are needed for this program.

Modules

You will also need to download a couple of heavy files which includes the pre-trained weights of YoloV3, the configuration file and the names file.

Weights and cfg (or configuration) files can be downloaded from https://pjreddie.com/darknet/yolo. You will see a couple of different options available. The model has been trained for different sizes of images: 320 x 320 (high speed, less accuracy), 416 x 416 (moderate speed, moderate accuracy) and 608 x 608 (less speed, high accuracy). We will download the weights and cfg files for YOLOv3–320 for now.

Names file can be downloaded from https://github.com/pjreddie/darknet/blob/master/data/coco.names.

Now that we have all these files downloaded and ready with us, we can start writing the python script. Like I mentioned before as well, our input can be in three forms:

  1. Image File
  2. Webcam Feed
  3. Video File

To start with, we will create a function called load_yolo().

Loading weights

In the above function, as you can see, I am loading the YoloV3 weights and configuration file with the help of the dnn module of OpenCV. The coco.names file contains the names of the different objects that our model has been trained to identify. We store them in a list called classes. Now to run a forward pass using the cv2.dnn module, we need to pass in the names of layers for which the output is to be computed. net.getUnconnectedOutLayers() returns the indices of the output layers of the network.

For accepting image files, we will need another function called load_image() which will accept an image path as a parameter, read the image, resize it and return it.

Load Images

To correctly predict the objects with deep neural networks, we need to preprocess our data and the cv2.dnn module provides us with two functions for this purpose: blobFromImage and blobFromImages. These functions perform scaling, mean subtraction and channel swap which is optional. We will use blobFromImage in a function called detect_objects() that accepts image/frame from video or webcam stream, model and output layers as parameters.

As you can see in the code snippet above, we have used the scalefactor of 0.00392 which can also be written as 1/255. Hence, we are scaling the image pixels to the range of 0 to 1. There is no need for mean subtraction and that’s why we set it to [0, 0, 0] value.

The forward() function of the cv2.dnn module returns a nested list containing information about all the detected objects which includes the x and y coordinates of the centre of the object detected, height and width of the bounding box, confidence and scores for all the classes of objects listed in coco.names. The class with the highest score is considered to be the predicted class.

In the get_box_dimensions() function, a list called scores is created which stores the confidence corresponding to each object. We then identify the index of class with the highest confidence/score using np.argmax(). We can get the name of the class corresponding to the index from the classes list we created in load_yolo().

I have selected all the predicted bounding boxes with a confidence of more than 30 %. You may play around with this value.

Now that we have the vertices of the predicted bounding box and class_id (index of predicted object class), we need to draw the bounding box and add an object label to it. We will do that with the help of the draw_labels() function.

Now, you must be wondering, what is cv2.dnn.NMSBoxes() is for? We were just supposed to add draw the bounding box and add a label to it, right?

Although we removed the low confidence bounding boxes, there is a possibility that we will still have duplicate detections around an object. For example, look at the image shown below.

Object Detection with Multiple Bounding Boxes

You may observe that some objects have been detected multiple times and we have more than one bounding box for it. To fix this situation we’ll need to apply Non-Maximum Suppression (NMS), also called Non-Maxima Suppression. We pass in confidence threshold value and NMS threshold value as parameters to select one bounding box. From the range of 0 to 1, we should select an intermediate value like 0.4 or 0.5 to make sure that we detect the overlapping objects but do not end up getting multiple bounding boxes for the same object.

So the final output looks like this:

Final Output

All the functions that we looked at can be pipelined together in another function called image_detect() for detecting objects in an image file.

Similarly, for video files and webcam input, we can create two different functions called start_video() and webcam_detect() respectively.

The entire working script can be found at https://github.com/nandinib1999/object-detection-yolo-opencv.

Update 2021/11/05

It is a possibility that the code might break for loading YOLO-V3 weights and you may get an error like shown below.

Error Screenshot

If that happens, you can use this snippet of code for load_yolo() instead:

Error Fix

Upon fixing this, I ran into another issue:

This is like a warning. It does not disrupt the code functionality but this warning keeps floating on the command line while the code is running. I tried searching for a solution for it and found out that it happens due to some dependency issue between OpenCV and PyQt libraries. You might have to upgrade/downgrade either of these libraries to make them compatible again.

A useful Issue Thread for understanding more about the warning and fixing it would be THIS. Also, if any one of you is able to find a permanent workaround for this warning, please do share it with us as well.

This is all for this article. I hope you find it useful. If yes, please give it a clap. For any questions and suggestions, feel free to connect with me on LinkedIn.

Thanks for reading!

~ Nandini

--

--