How to Detect Objects in Real-Time Using OpenCV and Python

A hands-on approach to understanding the basics of object detection

Vipul Kumar
Towards Data Science

--

Image by Author

For the uninitiated, Real-Time Object Detection might sound quite a mouthful. However, with a few awesome libraries at hand, the job becomes much easier than it sounds. In this article, we will be using one such library in python, namely OpenCV, to create a generalized program that can be used to detect any object in a video feed.

What is OpenCV?

OpenCV is an open-source library dedicated to solving computer vision problems. Assuming you have python 3 pre-installed on your machines, the easiest way of installing OpenCV to python is via pip. You can do it by typing the below command line in your command prompt.

pip3 install opencv-python

How does Object Detection work?

The object detection works on the Viola-Jones algorithm, which was proposed by Paul Viola and Michael Jones. The aforementioned algorithm is based on machine learning. The first step involves training a cascade function with a large amount of negative and positive labeled images. Once the classifier is trained, identifying features, namely “HAAR Features,” are extracted from these training images. HAAR features are essentially rectangular features with regions of bright and dark pixels.

Example of HAAR Feature Detection From Image (Image by Author)

Each feature's value is calculated as a difference between the sum of pixel intensity under the bright region and the pixel intensity under the dark region. All the possible sizes and location of the image is used to calculate these features. An image might contain many irrelevant features and few relevant features which can be used to identify the object. The classifier is trained with the pre-labeled dataset to extract the useful features to get the minimum errors by applying appropriate weights to each feature. An individual feature is called a weak feature. The final classifier is the weighted sum of the weak features. A large region of the image contains the background; only a certain region contains the object to be detected. To increase the detection speed, cascading of classifiers is implemented. In this process, if a region of an image gives even a single negative feature, that region is disregarded for further processing, and the algorithm moves on to the next region. The only region which contains all the identifying features is outlined as the required object in the image.

The above explanation is an oversimplified version. Even a simple digital image contains hundreds and thousands of pixels. Applying the algorithm straight away will require a huge computational power. Much more mathematical trickery goes in to simplify the calculation to make it computationally feasible. We will be discussing this in more detail in the upcoming article, primarily focused on the Viola-Jones Algorithm.

In this article, we will focus on the programming bit, using the readily available library. OpenCV has a bunch of pre-trained classifiers that can be used to identify objects such as trees, number plates, faces, eyes, etc. We can use any of these classifiers to detect the object as per our need.

Detecting the Object

After you installed the OpenCV package, open the python IDE of your choice and import OpenCV.

import CV2 

Since we want to detect the objects in real-time, we will be using the webcam feed. Use the below code to initiate the webcam.

# Enable we
# '0' is default ID for builtin web cam
# for external web cam ID can be 1 or -1
imcap = cv2.VideoCapture(0)
imcap.set(3, 640) # set width as 640
imcap.set(4, 480) # set height as 480

As mentioned earlier, OpenCV has various pre-trained HAAR classifiers stored as XML files. In this example, I am using haarcascade_frontalface_defaul a classifier for face detection. You can check other pre-trained classifiers in opencv/data/harrcascades/ folder.

# importing cascade
faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

The next step will be to run the classifier to detect the face in the video feed and the webcam. The basic steps are; first, we capture the frame from the video feed. Next, the captured frame is converted to grayscale. Finally, the grayscale image is passed through the classifier to detect the required object.

while True:
success, img = imcap.read() # capture frame from video
# converting image from color to grayscale
imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Getting corners around the face
# 1.3 = scale factor, 5 = minimum neighbor can be detected
faces = faceCascade.detectMultiScale(imgGray, 1.3, 5)

# drawing bounding box around face
for (x, y, w, h) in faces:
img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
# displaying image with bounding box
cv2.imshow('face_detect', img)
# loop will be broken when 'q' is pressed on the keyboard
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyWindow('face_detect')
Running the above program should give something like this (pardon the expression)

It should be clear that the Viola-Jones algorithm is not restricted to face detection only. Multiple types and numbers of objects in a single frame can be detected using this algorithm. All you need to do is to add multiple layers of cascade classifiers in the program, as per your requirement. Below is the complete code. I have added an additional example to include an additional layer of the classifier on the same image as a comment. You can experiment around with different classifiers using the same basic template.

I hope you enjoyed this quick tutorial. This is just a basic example of object detection using OpenCV and Python. The application is immense. More advanced techniques such as CNN and deep learning can be used to solve more complex computer vision problems. More on that in later articles.

--

--