Webcam Object Detection with Mask R-CNN on Google Colab
How to use Mask R-CNN for Object Detection with live camera stream on Google Colaboratory
There are plenty of approaches to do Object Detection. YOLO (You Only Look Once) is the algorithm of choice for many, because it passes the image through the Fully Convolutional Neural Network (FCNN) only once. This makes the inference fast. About 30 frames per second on a GPU.
Another popular approach is the use of Region Proposal Network (RPN). RPN based algorithms have two components. First component gives proposals for Regions of Interests (RoI)… i.e. where in the image might be objects. The second component does the image classification task on these proposed regions. This approach is slower. Mask R-CNN is a framework by Facebook AI that makes use of RPN for object detection. Mask R-CNN can operate at about 5 frames per second on a GPU. We will use Mask R-CNN.
Why use a slow algorithm when there are faster alternatives? Glad you asked!
Mask R-CNN also outputs object-masks in addition to object detection and bounding box prediction.
The following sections contain an explanation of the code and concepts that will help in understanding object detection, and working with camera inputs with Mask R-CNN, on Colab. It’s not a step by step tutorial but hopefully, it would be as effective. At the end of this article, you will find the link to the Colab notebook to try it yourself.
Matterport has a great implementation of Mask R-CNN using Keras and Tensorflow. They have provided Notebooks to play with Mask R-CNN, to train Mask R-CNN with your own dataset and to inspect the model and weights.
Why Google Colab
If you don’t have a GPU machine or don’t want to go through the tedious task of setting up the development environment, Colab is the best temporary option.
In my case, I had lost my favorite laptop recently. So, I am on my backup machine — a windows tablet with a keyboard. Colab enables you to work in a Jupyter Notebook in your browser, connected to a powerful GPU or a TPU (Tensor Processing Unit) virtual machine in Google Cloud. The VM comes pre-installed with Python, Tensorflow, Keras, PyTorch, Fastai and a lot of other important Machine Learning tools. All for free. Beware that your session progress gets lost due to a few minutes of inactivity.
Getting started with Google Colab
The Welcome to Colaboratory guide gets you started easily. And the Advanced Colab guide comes in handy when taking input from camera, communicating between different cells of the notebook, and communication between Python and JavaScript code. If you don’t have time to look at them, just remember the following.
A cell in Colab notebook usually contains Python code. By default, the code runs inside /content
directory of the connected Virtual Machine. Ubuntu is the operating system of Colab VMs and you can execute system commands by starting the line of the command with !
.
The following command will clone the repository.
!git clone https://github.com/matterport/Mask_RCNN
If you have multiple system commands in the same cell, then you must have %%shell
as the first line of the cell followed by system commands. Thus, the following set of commands will clone the repository, change the directory to Mask_RCNN and setup the project.
%%shell
# clone Mask_RCNN repo and install packages
git clone https://github.com/matterport/Mask_RCNN
cd Mask_RCNN
python setup.py install
Import Mask R-CNN
The following code comes from Demo Notebook provided by Matterport. We only need to change the ROOT_DIR
to ./Mask_RCNN
, the project we just cloned.
The python statement sys.path.append(ROOT_DIR)
makes sure that the subsequent code executes within the context of Mask_RCNN
directory where we have Mask R-CNN implementation available. The code imports the necessary libraries, classes and downloads the pre-trained Mask R-CNN model. Go through it. The comments make it easier to understand the code.
Create Model from Trained Weights
Following code creates model object in inference mode, so we could run predictions. Then it loads the weights from the pre-trained model that we downloaded earlier, into the model object.
Run Object Detection
Now we test the model on some images. Mask_RCNN repository has a directory named images
that contains... you guessed it... some images. The following code takes an image from that directory, passes it through the model and displays the result on the notebook along with bounding box information.
The result of the prediction
Working with Camera Images
In the advanced usage guide of Colab, they have provided code that can capture an image from a webcam in the notebook and then forward it to the Python code.
Colab notebook has pre-installed python package called google.colab
which contains handy helper methods. There's a method called output.eval_js
which helps us evaluate the JavaScript code and returns the output to Python. And in JavaScript, we know that there is a method called getUserMedia()
which enables us to capture the audio and/or video stream from user's webcam and microphone.
Have a look at the following JavaScript code. Using getUserMedia()
method of WebRTC API of JavaScript, it captures the video stream of the webcam and draws the individual frames on HTML canvas. Like google.colab
Python package, we have google.colab
library available to us in JavaScript. This library will help us invoke a Python method using kernel.invokeFunction
function from our JavaScript code.
The image captured from webcam is converted to Base64 format. This Base64 image is passed to a Python callback method, which we will define later.
We already discussed that having %%shell
as the first line of the Colab notebook cell makes it run as terminal commands. Similarly, you can write JavaScript in the whole cell by starting the cell with %%javascript
. But we will simply put the JavaScript code we wrote above, inside the Python code. Like this:
Python — JavaScript Communication
The JavaScript code we wrote above invokes notebook.run_algo
method of our Python code. The following code defines a Python method run_algo
which accepts a Base64 image, converts it to a numpy array and then passes it through the Mask R-CNN model we created above. Then it shows the output image and processing stats.
Important! Don’t forget to surround the Python code of your callback method in
try / except
block and log it. Because it will be invoked by JavaScript and there will be no sign of what error occurred while calling the Python callback.
Let’s register run_algo
as notebook.run_algo
. Now it will be invoke-able by the JavaScript code. We also call the take_photo()
Python method we defined above, to start the video stream and object detection.
Try it yourself
You are now ready to try Mask R-CNN on camera in Google Colab. The notebook will walk you step by step through the process.
(Optional) For Curious Ones
The process we used above converts the camera stream to images in a browser (JavaScript) and sends individual images to our Python code for object detection. This is obviously not real-time. So, I spent hours trying to upload the WebRTC stream from the JavaScript (peer A) to the Python Server (peer B) without success. Perhaps my unfamiliarity with the combination of async / await
with Python Threads
was the main hindrance. I was trying to use aiohttp
as Python server that will handle WebRTC connection using aiortc
. The Python library aiortc
makes it easy to create Python as a peer of WebRTC. Here is the link to the Colab notebook with an incomplete effort of creating WebRTC server.
Originally published at https://emadehsan.com on January 29, 2020.