YOLOv3 PyTorch on Google Colab
Doing object detection video processing on your browser
For computer vision enthusiasts, YOLO (You Only Look Once) is an extremely popular real-time object detection concept since its very fast and has great performance.
In this article, I will share codes for processing a video to get bounding boxes of each object every frame inside Google Colab
We will not discuss the YOLO concept or architecture since a lot of good articles in Medium already elaborate that. Here we only discuss functional codes
Let's get started
You can try yourself on this Google Colab.
We start from a well-written and my favorite git hub repo from Ultralytics. Despite the repo already contains how to process video using YOLOv3 just running python detect.py --source file.mp4
I would like to break down and try to simplify the codes just by removing several unnecessary lines for this case and I add how to show processed video on Google Colab / Jupyter notebook
Prepare YoloV3 and LoadModel
First clone Ultralytics YoloV3 Repository then import common packages and repo’s function
Set Argument Parser, initialize devices (CPU / CUDA), initialize the YOLO model then load weight.
We are using YOLOv3-spp-ultralytics
weights which the repo said it far better than other YOLOv3 in Mean Average Precision
The functiontorch_utils.select_device()
will automatically find available GPU unless the input'cpu'
Object Darknet
is initialize YOLOv3 architecture on PyTorch and the weight needs to be loaded using pre-trained weight (we don't want to train the model at this time)
Predict Object Detection on Video
Next, we will read the video file and rewrite the video with objects bounding boxes. Following 3 GitHub Gist is part of a function predict_one_video
that will be used at the end.
We are writing the new video using MP4 format, it explicitly stated on vid_writer
. Whilefps
, width
and height
are used according to the original video
Start looping each frame on the video to get predictions.
The image size for this model is 416. A function name letterbox
is resizing the image and give padding to image hence one of width or height becomes 416 and the other less than equal 416 but still divisible by 32
The second part is we turn the image into RGB format and put channels in the first dimension (C,H,W)
. Put the image data into the device (GPU or CPU) and scale the pixel from 0-255
to 0-1
. Before we put the image into the model, we use the function img.unsqeeze(0)
because we have to reformat image into 4 dimensions (N,C,H,W)
which N is the number of images which is 1 in this case.
After preprocessing the image, we put it into the model to get prediction boxes. But the predictions have a lot of boxes so we need non-maximum suppression
to filter and merge the boxes.
Draw Bounding Box and Label then Write Video
We loop all the prediction (pred)
after NMS to draw the box, but the image is already resized into 416 pixels, we need to scale it back into original size using the function scale_coords
then we draw the box using the function plot_one_box
Show Video on Colab
The video is written as Mp4 Format on function predict_one_video
after saved as Mp4 we compress into h264
so the video can be played on Google Colab / Jupyter directly.
Show Raw Video
We show video using IPython.display.HTML
with 400 width pixels. The video is read using binary
Compress and Show Processed Video
The output of OpenCV video writer is an Mp4 video with size 3 times larger than the original video, and it cannot be displayed on Google Colab using the same way, one of the solutions is we do the compression (source)
We compress Mp4 video to h264 usingffmpeg -i {save_path} -vcodec libx264 {compressed_path}
Result
Try on your own video
Go to the Google Colab file on GitHub HERE
- upload your video inside
input_video
folder - Just Run the last cell (predict & show video)
Source
Thank you for reading, hope it helpful
Next stories:
- Yolov3 using a webcam on Google Colab
- Yolov3 training hand detection
- Yolov3 training safety helmet