INSTANCE SEGMENTATION | DEEP LEARNING

Mask RCNN implementation on a custom dataset!

All incorporated in a single python notebook!

Dhruvil Shah
Towards Data Science
10 min readDec 25, 2020

--

Photo by Ethan Hu on Unsplash

What is Instance Segmentation?

Instance segmentation is the function of pixel-level recognition of object outlines. It’s one of the hardest possible vision tasks relative to equivalent computer vision tasks. Refer to the following terminologies:

  • Classification: There is a horse/man in this image.
  • Semantic Segmentation: These are all the horse/man pixels.
  • Object Detection: There are a horse and a man in this image at these locations. We’re starting to account for objects that overlap.
  • Instance Segmentation: There is a horse and a man at these locations and these are the pixels that belong to each one.

You can learn the basics and how Mask RCNN actually works from here.

We will implement Mask RCNN for a custom dataset in just one notebook. All you need to do is run all the cells in the notebook. We will perform simple Horse vs Man classification in this notebook. You can change this to your own dataset.

I have shared the links at the end of the article.

Let’s begin.

1. Importing and cloning repositories.

First, we will clone a repository that contains some of the building code blocks for implementing Mask RCNN. The function copytree() will get the necessary files for you. After cloning the repository we will import the dataset. I am importing the dataset from my drive.

!git clone "https://github.com/SriRamGovardhanam/wastedata-Mask_RCNN-multiple-classes.git"import shutil, osdef copytree(src = '/content/wastedata-Mask_RCNN-multiple-classes/main/Mask_RCNN', dst = '/content/', symlinks=False, ignore=None):
try:
shutil.rmtree('/content/.ipynb_checkpoints')
except:
pass
for item in os.listdir(src):
s = os.path.join(src, item)
d = os.path.join(dst, item)
if os.path.isdir(s):
shutil.copytree(s, d, symlinks, ignore)
else:
shutil.copy2(s, d)
copytree()
shutil.copytree('/content/drive/MyDrive/MaskRCNN/newdataset','/content/dataset')

Let’s talk about the dataset now. The dataset that I imported from my drive is structured as:

dataset
--train
----21414.JPG
----2r41rfw.JPG
----...
----via_project.json
--val
----w2141fq2cr2qrc234.JPG
----2awfr4awawfa1rfw.jpg
----...
----via_project.json

The dataset that I have used, is created from VGG Image Annotator (VIA). The annotation file is in .json format which contains the coordinates of all the polygons which I have drawn on my images. The .json file will look something like this:

{"00b1f292-23dd-44d4-aad3-c1ffb6a6ad5a___RS_LB 4479.JPG21419":{"filename":"00b1f292-23dd-44d4-aad3-c1ffb6a6ad5a___RS_LB 4479.JPG","size":21419,"regions":[{"shape_attributes":{"name":"polygon","cx":83,"cy":177,"r":43},"region_attributes":{"name":"Horse","image_quality":{"good":true,"frontal":true,"good_illumination":true}}},{"shape_attributes":{'all_points_x': [1,2,4,5], 'all_points_y': [0.2,2,5,7], 'name': 'polygon'},"region_attributes":{"name":"Man","image_quality":{"good":true,"frontal":true,"good_illumination":true}}},{"shape_attributes":{"name":"ellipse","cx":156,"cy":189,"rx":19.3,"ry":10,"theta":-0.289},"region_attributes":{"name":"Horse","image_quality":{"good":true,"frontal":true,"good_illumination":true}}}],"file_attributes":{"caption":"","public_domain":"no","image_url":""}},..., ...}

Note: Although I have shown more than one shape in the above snippet, you should only use one shape while annotating the images. For instance, if you choose polygons as I have for the Horse vs Man classifier then you should annotate all the regions with polygons only. We will discuss how you can use multiple shapes for annotations later in this tutorial.

2. Selection of right versions of libraries

Note: This is the most important step for implementing the Mask RCNN architecture without getting any errors.

!pip install keras==2.2.5
%tensorflow_version 1.x

You will face many errors in TensorFlow version 2.x. Also, the Keras version 2.2.5 will save us from many errors. I will not go into detail regarding which kinds of error this resolves.

3. Configuration according to our dataset

First, we will import a few libraries. Then we will give the path to the trained weights file. This could be the COCO weights file or your last saved weights file (checkpoint). The log directory is where all our data will be stored when training begins. The model weights at every epoch are saved in .h5 format in the directory so if the training gets hindered due to any reason you can always start from where you left off by specifying the path to the last saved model weights. For instance, if I am training my model for 10 epochs and at epoch 3 my training is obstructed then I will have 3 .h5 files stored in my logs directory. And now I do not have to start my training from the beginning. I can simply change my weights path to the last weights file e.g. ‘mask_rcnn_object_0003.h5’.

import os
import sys
import json
import datetime
import numpy as np
import skimage.draw
import cv2
from mrcnn.visualize import display_instances
import matplotlib.pyplot as plt
# Root directory of the project
ROOT_DIR = os.path.abspath("/content/wastedata-Mask_RCNN-multiple-classes/main/Mask_RCNN/")
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn.config import Config
from mrcnn import model as modellib, utils
# Path to trained weights file
COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Directory to save logs and model checkpoints
DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")
class CustomConfig(Config):
"""Configuration for training on the dataset.
Derives from the base Config class and overrides some values.
"""
# Give the configuration a recognizable name
NAME = "object"

# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 2
# Number of classes (including background)
NUM_CLASSES = 1 + 2 # Background + (Horse and Man)
# Number of training steps per epoch
STEPS_PER_EPOCH = 100
# Skip detections with < 90% confidence
DETECTION_MIN_CONFIDENCE = 0.9

CustomConfig class contains our custom configurations for the model. We simply overwrite the data in the original Config class from the config.py file that was imported earlier. The number of classes is supposed to be total_classes + 1 (for background). Steps per epoch are set to 100 but if you want you can increase if you have access to higher computing resources.

The detection threshold is 90% that means all the proposals with less than 0.9 confidence will be ignored. This threshold is different than the one while testing an image. Look at the two images below for clarification.

4. Setting up the CustomDataset class

The class below contains 3 crucial methods for our custom dataset. This class inherits from “utils.Dataset” which we imported in the 1st step. The ‘load_custom’ method is for saving the annotations along with the image. Here we extract polygons and the respective classes.

polygons = [r[‘shape_attributes’] for r in a[‘regions’]]

objects = [s[‘region_attributes’][‘name’] for s in a[‘regions’]]

Polygons variable contains the coordinates of the masks. Objects variable contains the names of the respective classes.

The ‘load_mask’ method loads the masks as per the coordinates of polygons. The mask of an image is nothing but a list containing binary values. Skimage.draw.polygon() does the task for us and returns the indices for the coordinates of the mask.

Earlier we discussed that you should not use more than one shape while annotating as it will get intricate when loading masks.

Although If you want to use multiple shapes i.e. circle, ellipse, and polygon then you will need to change the load mask function as below.

def load_mask(self, image_id):
...
##change the for loop only
for i, p in enumerate(info["polygons"]):
# Get indexes of pixels inside the polygon and set them to 1
if p['name'] == 'polygon':
rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
elif p['name'] == 'circle':
rr, cc = skimage.draw.circle(p['cx'], p['cy'], p['r'])
else:
rr, cc = skimage.draw.ellipse(p['cx'], p['cy'], p['rx'], p['ry'], rotation=np.deg2rad(p['theta']))
mask[rr, cc, i] = 1

...

This is one way you can load masks with multiple shapes but this is just speculation and the model will not detect a circle or ellipse unless it captures a perfect circle or ellipse.

5. Creating Train() function

The dataset that I imported from my drive is structured as:

dataset
--train
----21414.JPG
----2r41rfw.JPG
----...
----via_project.json
--val
----w2141fq2cr2qrc234.JPG
----2awfr4awawfa1rfw.jpg
----...
----via_project.json

First, we will create an instance of CustomDataset class for the training dataset. Similarly, create another instance for the validation dataset. Then we will call the load_custom() method by passing in the name of the directory where our data is stored. ‘layers’ parameter is set to ‘heads’ here as I am not planning to train all the layers in the model. This will only train some of the top layers in the architecture. If you want you can set ‘layers’ to ‘all’ for training all the layers in the model.

I am running the model for only 10 only as this tutorial is supposed to just guide you.

6. Setup before the training

This step is for setting up the model for training and downloading and loading pre-trained weights. You can load the weights of COCO or your last saved model.

The call ‘modellib.MaskRCNN()’ is the step where you can get lots of errors if you have not chosen the right versions in the 2nd section. This method has a parameter ‘mode’ which decides whether we want to train the model or test the model. If you want to test set ‘mode’ to ‘inference’. ‘model_dir’ is for saving the data while training for backup. Then, we download the pre-trained COCO weights in the next step.

Note: If you want to resume training from a saved point then you need to change ‘weights_path’ to the path where your .h5 file is stored.

7. Start training

This step should not throw any error if you have followed the steps above and training should start smoothly. Remember we need to have Keras version 2.2.5 for this step to run error-free.

train(model)

Note: If you get an error restart runtime and run all the cells again. It may be because of the version of TensorFlow or Keras loaded. Follow step 2 for choosing the right versions.

Note: Ignore any warnings you get while training!

8. Testing

You can find the instructions in the notebook on how to test our model once training is finished.

import os
import sys
import random
import math
import re
import time
import numpy as np
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.image as mpimg
# Root directory of the project
#ROOT_DIR = os.path.abspath("/")

# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn import utils
from mrcnn import visualize
from mrcnn.visualize import display_images
import mrcnn.model as modellib
from mrcnn.model import log
%matplotlib inline# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Path to Ballon trained weights
# You can download this file from the Releases page
# https://github.com/matterport/Mask_RCNN/releases
WEIGHTS_PATH = "/content/wastedata-Mask_RCNN-multiple-classes/main/Mask_RCNN/logs/object20201209T0658/mask_rcnn_object_0010.h5" # TODO: update this path

Here, we define the path to our last saved weights file to run the inference on. Then, we define simple configuration terms. Here, confidence is specified again for testing. This is different from the one used in training.

config = CustomConfig()CUSTOM_DIR = os.path.join(ROOT_DIR, "/content/dataset/")class InferenceConfig(config.__class__):  # Run detection on one image at a time
GPU_COUNT = 1
IMAGES_PER_GPU = 1
DETECTION_MIN_CONFIDENCE = 0.7
config = InferenceConfig()
config.display()
# Device to load the neural network on. Useful if you're training a model on the same machine, in which case use CPU and leave the GPU for training.DEVICE = "/gpu:0"  # /cpu:0 or /gpu:0# Inspect the model in training or inference modes values: 'inference' or 'training'TEST_MODE = "inference"def get_ax(rows=1, cols=1, size=16):  ""Return a Matplotlib Axes array to be used in all visualizations in the notebook. Provide a central point to control graph sizes. Adjust the size attribute to control how big to render images"""  _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))  return ax
# Load validation dataset
CUSTOM_DIR = "/content/dataset"
dataset = CustomDataset()
dataset.load_custom(CUSTOM_DIR, "val")
# Must call before using the dataset
dataset.prepare()
print("Images: {}\nClasses: {}".format(len(dataset.image_ids), dataset.class_names))

Now we will load the model to run inference.

#LOAD MODEL
# Create model in inference mode
with tf.device(DEVICE):
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

Now loading the weights in the model.

# Load COCO weights Or, load the last model you trainedweights_path = WEIGHTS_PATH
# Load weights
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)

Now, we are ready for testing our model on any image.

#RUN DETECTION
image_id = random.choice(dataset.image_ids)
print(image_id)
image, image_meta, gt_class_id, gt_bbox, gt_mask =\
modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id,dataset.image_reference(image_id)))
# Run object detection
results = model.detect([image], verbose=1)
# Display results
x = get_ax(1)
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], dataset.class_names, r['scores'], ax=ax, title="Predictions")
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)
# This is for predicting images which are not present in dataset
path_to_new_image = '/content/dataset/test/unnamed.jpg'
image1 = mpimg.imread(path_to_new_image)# Run object detection
print(len([image1]))
results1 = model.detect([image1], verbose=1)
# Display results
ax = get_ax(1)
r1 = results1[0]
visualize.display_instances(image1, r1['rois'], r1['masks'], r1['class_ids'],
dataset.class_names, r1['scores'], ax=ax, title="Predictions1")
Photo by Ethan Hu on Unsplash | Testing the model: Image 1 from the dataset | Image 2 from outside the dataset | Photo by Timothy Eberly on Unsplash

9. Color Splash

For fun, you can try the below code which is present in the original implementation of Mask RCNN. This will convert everything grayscale except for the object mask areas.

You can call this function as specified below. For video detection, call the second function.

splash = color_splash(image, r['masks'])display_images([splash], cols=1)
Photo by Ethan Hu on Unsplash

10. Further…

You can learn how Region Proposals work from the latter cells of the notebook. Below are some of the interesting images that might catch your eye if you are curious about how the Region Proposal Networks work.

Photo by Ethan Hu on Unsplash

You may want to see how Proposal classification works and how we get our final regions for segmentation. This portion is also covered in the latter part of the notebook.

Photo by Ethan Hu on Unsplash

--

--