The world’s leading publication for data science, AI, and ML professionals.

Object detection with Tensorflow model and OpenCV

Using a trained model to identify objects on static images and live video

source
source

In this article, I’m going to demonstrate how to use a trained model to detect objects in images and videos using two of the best libraries for this kind of problem. For the detection, we need a model capable of predicting multiple classes in an image and returning the location of those objects so that we can place boxes on the image.

The Model

We are going to use a model from the Tensorflow Hub library, which has multiple ready to deploy models trained in all kinds of datasets and to solve all kinds of problems. For our use, I filtered models trained for Object Detection tasks and models in the TFLite format. This format is usually used for IoT applications, for its small size and faster performance than bigger models. I choose this format because I intend to use this model on a Rasberry Pi on future projects.

The chosen model was the EfficientDet-Lite2 Object detection model. It was trained on the COCO17 dataset with 91 different labels and optimized for the TFLite application. This model returns:

  1. The box boundaries of the detection;
  2. The detection scores (probabilities of a given class);
  3. The detection classes;
  4. The number of detections.

Detecting Objects

I’m going to divide this section into two parts: Detections on static images and detection on live webcam video.

Static Images

We will start by detecting objects in this image from Unsplash:

source
source

So the first thing we have to do is load this image and process it to the expected format for the TensorFlow model.

Basically, we used Opencv to load and do a couple of transformations on the raw image to an RGB tensor in the model format.

Now we can load the model and the labels:

The model is being loaded directly from the website however, you can download it to your computer for better performance on the loading. The text labels CSV is available on the project repo.

Now we can create the predictions and put in the image the boxes and labels found:

Now if we run _plt.imshow(imgboxes) we get the following output:

source with modifications
source with modifications

Live Webcam Video

Now we can move on to detecting objects live using the webcam on your pc.

This part is not as hard as it seems, we just have to insert the code we used for one image in a loop:

Then we get:

GIF by Author
GIF by Author

We used VideoCapture from open cv to load the video from the computer webcam. Then we did the same processing that we used on the static image and predicted the labels and positions. The main difference is that the image input is continuous so we inserted the code inside a while loop.

All the code and notebooks used are in this repository:

gabrielcassimiro17/raspberry-pi-tensorflow

In the near future, I will load this into a raspberry pi to create some interactions using a model capable of detecting objects, and post the results here.


If you like the content and want to support me, you can buy me a coffee:

Gabriel Cassimiro is a Data Scientist sharing free content to the community


Related Articles