The world’s leading publication for data science, AI, and ML professionals.

Object (Drones) Detection: Step-by-Step Guide on Mask R-CNN

I will walk you guys through my process of training custom dataset using Mask R-CNN and hopes it helps some of you simplify the process.

Photo by Miguel Ángel Hernández on Unsplash
Photo by Miguel Ángel Hernández on Unsplash

Object detection is a class of computer vision that identify and localise objects within an image. Numerous detection algorithms exist out there and here is a good summary for them.

Mask R-CNN is an extension of object detection as it generates bounding boxes and segmentation masks for each object detected in the image. I recently had to train a Mask R-CNN model and faced some roadblocks while trying to train on my custom dataset. Even with a sample notebook available from matterport, implementation isn’t that straightforward due to compatibility and data issues. Hence, I decided to write this guide on training a custom dataset using Mask R-CNN and hopes it helps some of you simplify the process.


https://github.com/matterport/Mask_RCNN
https://github.com/matterport/Mask_RCNN

For this guide, I chose to use a Drones dataset which you can download here.

First up – Libraries & Packages

The main package for the algorithm is mrcnn. Start by downloading and import the library into your environment.

!pip install mrcnn
from mrcnn.config import Config
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
from mrcnn.model import log

I will explain each imported class as we get there. For now, just know that these are the import statements we need.

As for TensorFlow, mrcnn is not yet compatible with TensorFlow 2.0 onward so make sure you revert to TensorFlow 1.x. Since I’m developing on Colab, I will be using the magic function to revert to TensorFlow 1.x.

%tensorflow_version 1.x
import tensorflow as tf

If I’m not wrong, tf.random_shuffle was renamed to tf.random.shuffle in TensorFlow 2.0, causing the incompatibility issue. You might be able to work with TensorFlow 2.0 by changing your shuffle function in the mrcnn code.

I also had to revert my Keras to the previous version but I can’t remember the reason. Just put it out there in case you encounter some errors due to Keras.

!pip install keras==2.2.5

Preprocessing

The mrcnn package is rather flexible in terms of the format of data it accepts. As such, I will be processing into NumPy arrays due to its simplicity.

Before that, I realised that video17_295 and video19_1900 can’t be read properly by cv2. Hence, I filtered out these images and created a list of file names.

dir = "Database1/"
# filter out image that cant be read
prob_list = ['video17_295','video19_1900'] # cant read format
txt_list = [f for f in os.listdir(dir) if f.endswith(".txt") and f[:-4] not in prob_list]
file_list = set([re.match("w+(?=.)",f)[0] for f in txt_list])
# create data list as tuple of (jpeg,txt)
data_list = []
for f in file_list:
    data_list.append((f+".JPEG",f+".txt"))

Few things to do next;

  1. Check if label exist (some images don’t contain drones)
  2. Read and process image
  3. Read and process coordinates of the bounding box
  4. Draw the bounding box for visualization purposes
X,y = [], []
img_box = []
DIMENSION = 128 # set low resolution to decrease training time
for i in range(len(data_list)):
    # get bounding box and check if label exist
    with open(dir+data_list[i][1],"rb") as f:
    box = f.read().split()
    if len(box) != 5: 
        continue # skip data if does not contain label
box = [float(s) for s in box[1:]]
# read image
img = cv2.imread(dir+data_list[i][0])
    img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
# resize img to 128 x 128
    img = cv2.resize(img, (DIMENSION,DIMENSION), interpolation= cv2.INTER_LINEAR)
# draw bounding box (for visualization purposes)
    resize1, resize2 = img.shape[0]/DIMENSION, img.shape[1]/DIMENSION
    p1,p2,p3,p4 = int(box[0]*img.shape[1]*resize2), int(box[1]*img.shape[0]*resize1) ,int(box[2]*img.shape[1]*resize2) ,int(box[3]*img.shape[0]*resize1)
ymin, ymax, xmin, xmax = p2-p4//2, p2+p4//2, p1-p3//2, p1+p3//2
draw = cv2.rectangle(img.copy(),(xmax,ymax),(xmin,ymin),color=(255,255,0),thickness =1)
# store data if range of y is at least 20 pixels (remove data with small drones)
    if ymax - ymin >=20:
        X.append(img)
        y.append([ymin, ymax, xmin, xmax])
        img_box.append(draw)
# convert to numpy arrays
X = np.array(X).astype(np.uint8)
y = np.array(y)
img_box = np.array(img_box)

Before converting to NumPy arrays, I grab a sub-population of the dataset to keep the training time down. If you have the computing power, feel free to omit that.

Here are some sample images.

MRCNN – Processing

Now to look at mrcnn proper, we will need to define an mrcnn Dataset class before the training process. This Dataset class provides the image’s information such as the class it belongs to and positions of the objects within them. The mrcnn.utils which we had previously imported contains this Dataset class.

Here is where things get a little tricky and require some reading into the source code.

These are the functions you need to modify;

  1. add_class, which determine the number of classes for the model
  2. add_image, where you define the image_id and the path to the image if applicable
  3. load_image, where images data are loaded
  4. load_mask, which grabs information about the mask/bounding box of the image
# define drones dataset using mrcnn utils class
class DronesDataset(utils.Dataset):
    def __init__(self,X,y): # init with numpy X,y
        self.X = X
        self.y = y
        super().__init__()
def load_dataset(self):
        self.add_class("dataset",1,"drones") # only 1 class, drones
        for i in range(len(self.X)):
            self.add_image("dataset",i,path=None)
def load_image(self,image_id):
        image = self.X[image_id] # where image_id is index of X
        return image
def load_mask(self,image_id):
    # get details of image
    info = self.image_info[image_id]
    #create one array for all masks, each on a different channel
    masks = np.zeros([128, 128, len(self.X)], dtype='uint8')
class_ids = []
    for i in range(len(self.y)):
        box = self.y[info["id"]]
        row_s, row_e = box[0], box[1]
        col_s, col_e = box[2], box[3]
        masks[row_s:row_e, col_s:col_e, i] = 1 # create mask with similar boundaries as bounding box
        class_ids.append(1)
return masks, np.array(class_ids).astype(np.uint8)

Since we took the effort to format our images into NumPy arrays, we can simply initialize the Dataset class with our array and load the images and bounding boxes by indexing into the array.

Next to do a train-test split the old traditional way,

# train test split 80:20
np.random.seed(42) # for reproducibility
p = np.random.permutation(len(X))
X = X[p].copy()
y = y[p].copy()
split = int(0.8 * len(X))
X_train = X[:split]
y_train = y[:split]
X_val = X[split:]
y_val = y[split:]

Now to load your data into the Dataset class.

# load dataset into mrcnn dataset class
train_dataset = DronesDataset(X_train,y_train)
train_dataset.load_dataset()
train_dataset.prepare()
val_dataset = DronesDataset(X_val,y_val)
val_dataset.load_dataset()
val_dataset.prepare()

The prepare() function uses the image_ids and class_ids information to prep your data for the mrcnn model,

Following on is the modification of the config class we imported from mrcnn. The Config class determines the variables used in training and should be tweak according to your dataset. These variables below are not exhaustive, you can refer to the documentation for the full list.

class DronesConfig(Config):
    # Give the configuration a recognizable name
    NAME = "drones"
# Train on 1 GPU and 2 images per GPU.
    GPU_COUNT = 1
    IMAGES_PER_GPU = 2
# Number of classes (including background)
    NUM_CLASSES = 1+1  # background + drones
# Use small images for faster training. 
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128
# Reduce training ROIs per image because the images are small and have few objects.
    TRAIN_ROIS_PER_IMAGE = 20
# Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels
# set appropriate step per epoch and validation step
    STEPS_PER_EPOCH = len(X_train)//(GPU_COUNT*IMAGES_PER_GPU)
    VALIDATION_STEPS = len(X_val)//(GPU_COUNT*IMAGES_PER_GPU)
# Skip detections with < 70% confidence
    DETECTION_MIN_CONFIDENCE = 0.7
config = DronesConfig()
config.display()

Depending on your computing power, you might have to adjust these variables accordingly. Else, you will face the issue of getting stuck at ‘Epoch 1’ with no error message given. There is even a GitHub issue raised for this problem and many solutions were proposed. Do check it out if it happens to you and test out a few of these suggestions.

MRCNN – Training

mrcnn has been trained on the COCO and ImageNet dataset. To make use of these pre-trained weights for transfer learning, we need to download it into our environment (remember to define your ROOT_DIR first).

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

Creating the model and initiating with the pre-trained weights.

# Create model in training mode using gpu
with tf.device("/gpu:0"):
    model = modellib.MaskRCNN(mode="training", config=config,model_dir=MODEL_DIR)
# Which weights to start with?
init_with = "imagenet"  # imagenet, coco
if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # Load weights trained on MS COCO, but skip layers that
    # are different due to the different number of classes
    # See README for instructions to download the COCO weights
    model.load_weights(COCO_MODEL_PATH, by_name=True,exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

Finally, we can proceed on to the actual training.

model.train(train_dataset, val_dataset,learning_rate=config.LEARNING_RATE,epochs=5,layers='heads') # unfreeze head and just train on last layer

For this exercise, I will only train the last layer to detect drones in our dataset. If time allows, you should also fine-tune your model by training all the preceding layers.

model.train(train_dataset, val_dataset, 
            learning_rate=config.LEARNING_RATE / 10,
            epochs=2, 
            layers="all")

And you’re done with training your mrcnn model. You can save the model’s weights with these 2 lines of code.

# save weights
model_path = os.path.join(MODEL_DIR, "mask_rcnn_drones.h5")
model.keras_model.save_weights(model_path)

MRCNN – Inference

To make inference on other images, you will need to create a new inference model with a custom Config.

# make inference
class InferenceConfig(DronesConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
inference_config = InferenceConfig()
# Recreate the model in inference mode
model = modellib.MaskRCNN(mode="inference",config=inference_config, model_dir=MODEL_DIR)
# Load trained weights
model_path = os.path.join(MODEL_DIR, "mask_rcnn_drones.h5")
model.load_weights(model_path, by_name=True)

The visualize class from mrcnn comes in handy here.

def get_ax(rows=1, cols=1, size=8):
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
return ax
# Test on a random image
image_id = random.choice(val_dataset.image_ids)
original_image, image_meta, gt_class_id, gt_bbox, gt_mask =
modellib.load_image_gt(val_dataset, inference_config,image_id, use_mini_mask=False)
results = model.detect([original_image], verbose=1)
r = results[0]
visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'],val_dataset.class_names, r['scores'], ax=get_ax())

Congrats, you’ve trained an mrcnn model with a custom dataset. Pat yourself on the back as this is no easy feat. As you can see, the masking isn’t perfect for our dataset as we do not have the masking data. Test the model on one that does and you should get better results.

Happy learning.


Related Articles