The world’s leading publication for data science, AI, and ML professionals.

Instance Segmentation with Mask R-CNN

A brief guide to detect object along with masks.

A brief guide to using Mask R-CNN trained on MS COCO dataset

Object Detection and Instance Segmentation - Input image source sharedspace
Object Detection and Instance Segmentation – Input image source sharedspace

Object Detection models such as YOLO, R-CNN help us to draw a bounding box surrounding the objects, and the Instance Segmentation provides us the pixel-wise masks for each object in the image. One question may arise that why we need pixel by pixel location?

If we just use object detection in the self -driving cars then there is a possibility that the bounding boxes of multiple cars to be overlapped, the self-driving car will get confused in such a situation. Instance segmentation can avoid this flaw. Damage detection and medical diagnosis are some other applications that come to my mind since knowing the extent of damage or the size of the brain tumor could be important than just detecting the presence.

Intersecting Bounding boxes and non-overlapping masks
Intersecting Bounding boxes and non-overlapping masks

In the above, image we can see that the bounding boxes of cars are intersecting and the masks with class name ‘car’ are not intersecting/ overlapping.

So, we will go through how to do instance segmentation using Mask R-CNN (Mask Regional-CNN) following that using Mask R-CNN we can obtain both the pixel by pixel locations and the bounding box co-ordinates of each object in the image.

Mask R-CNN

Mask R-CNN combines Faster R-CNN and FCN (Fully Connected Network) to get additional mask output other than the class and box outputs. That being, Mask R-CNN adopts the same two-stage procedure, with an identical first stage (which is RPN: Region Proposal Network). The second stage extracts feature using RoIPool from each candidate box and perform classification and bounding-box regression. Read this paper to get a more detailed idea of the Mask R-CNN

Mask R-CNN model - Source
Mask R-CNN model – Source

I have used Mask R-CNN built on FPN and ResNet101 by matterport for instance segmentation. This model is pre-trained on MS COCO which is large-scale object detection, segmentation, and captioning dataset with 80 object classes.

Before going through the code make sure to install all the required packages and Mask R-CNN.

Install Keras and other dependencies:

$ pip install numpy scipy keras h5py tensorflow
$ pip install pillow scikit-image matplotlib imutils
$ pip install "IPython[all]"

Clone the GitHub repository and install the matterplot implementation of Mask R-CNN

$git clone https://github.com/matterport/Mask_RCNN.git
$cd Mask_RCNN
$python setup.py install

Note: If you have installed or using tensorflow v2.0 then you may face some Traceback errors while executing the script since Mask R-CNN uses tensorflow v1.3.0

To avoid this either you downgrade the tensorflow version or edit the file Mask_RCNN/rcnn/model.py by replacing following functions before installing the Mask R-CNN:

  • tf.log() -> tf.math.log()
  • tf.sets.set_intersection() -> tf.sets.intersection()
  • tf.sparse_tensor_to_dense() -> tf.sparse.to_dense()
  • tf.to_float() -> tf.cast([value], tf.float32)

Now, we are set to execute the script:

Step I: Import required packages

from mrcnn.config import Config
from mrcnn import model as modellib
from mrcnn import visualize
import cv2
import colorsys
import argparse
import imutils
import random
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

Step II: Generate random colors for each class label.

Now we create a configuration that defines some properties of the model that is to be loaded in the next step.

Feel free to increase the value of the variable IMAGES_PER_GPU if your GPU can handle it otherwise(in case of CPU) keep it 1.

class SimpleConfig(Config):
    # give the configuration a recognizable name
    NAME = "coco_inference"
    # set the number of GPUs to use along with the number of images
    # per GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
    # number of classes on COCO dataset
    NUM_CLASSES = 81

Step IV: Create a configuration class object and Load the model with weights.

You can download the weights from here.

config = SimpleConfig()
config.display()
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=os.getcwd())
model.load_weights("mask_rcnn_coco.h5", by_name=True)

Step V: Perform a forward pass on any image to get segmented output.

In this step, we pass an image through the loaded model in order to get the output variable with class labels, bounding box coordinates, and masks.

image = cv2.imread("<image_path&amp;name>")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = imutils.resize(image, width=512)
# perform a forward pass of the network to obtain the results
print("[INFO] making predictions with Mask R-CNN...")
result = model.detect([image], verbose=1

Step VI: Visualize the output

r1 = result[0]
visualize.display_instances(image, r1['rois'], r1['masks'],   r1['class_ids'], CLASS_NAMES, r1['scores'])

Sample Output:

The Mask R-CNN model trained on COCO created a pixel-wise map of my classmates.
The Mask R-CNN model trained on COCO created a pixel-wise map of my classmates.
Crowded street in India in the view of Mask R-CNN
Crowded street in India in the view of Mask R-CNN

Summing up this post, I would say instance segmentation is one step further of object detection because it yields pixel by pixel masks of the image. The Faster R-CNN is computationally expensive and we introduce instance segmentation on top of that in Mask R-CNN. Consequently, the Mask R-CNN becomes computationally more expensive. This makes Mask R-CNN difficult to run in real-time on CPU.

References

Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, Mask R-CNN, Source


Related Articles