Review: InstanceFCN — Instance-Sensitive Score Maps (Instance Segmentation)

Fully Convolutional Network (FCN), With Instance-Sensitive Score Maps, Better than DeepMask, Competitive with MNC

Published in

Towards Data Science

5 min readJan 7, 2019

In this story, InstanceFCN (Instance-sensitive Fully Convolutional Networks), by Microsoft Research, Tsinghua University, and University of Science and Technology of China, is shortly reviewed.

By using Fully Convolutional Network (FCN), Instance-Sensitive Score Maps are introduced and all Fully Connected (FC) layers are removed. Competitive results of instance segment proposal on both PASCAL VOC and MS COCO are obtained. It is published in 2016 ECCV with more than 100 citations. (Sik-Ho Tsang @ Medium)

What Are Covered

Network Structure
Instance-Sensitive Score Maps
Ablation Study
Results

1. Network Structure

VGG-16 pretrained on ImageNet is used as feature extractor. Max pooling layer pool4 is modified from stride 2 to stride 1. Accordingly conv5_1 to conv5_3 are adjusted by “hole algorithm”, which was used by DeepLab & DilatedNet before, in order to decrease the output stride, i.e. increase the output feature map size.
On top of the feature map, there are two fully convolutional branches, one for estimating segment instances and the other for scoring the instances.

Instance-sensitive score maps branch

For the first branch (top path), we adopt a 1×1 512-d convolutional layer to transform the features, and then use a 3×3 convolutional layer to generate a set of k² instance-sensitive score maps, which is k² output channels. (k=5 finally.)
An assembling module is used to generate object instances in a sliding window of a resolution m×m. (m=21 here.)
The idea is very similar to that of positive-sensitive score maps in R-FCN. But R-FCN uses positive-sensitive score maps for object detection while InstanceFCN uses instance-sensitive score maps for generating proposals.

Objectness score map branch

For the second branch of scoring instances (bottom path), we use a 3×3 512-d convolutional layer followed by a 1×1 convolutional layer. This 1×1 layer is a per-pixel logistic regression for classifying instance/not-instance of the sliding window centered at this pixel. Thus, it is a objectness score map.

Loss function

Here i is the index of a sampled window, pi is the predicted objectness score of the instance in this window, and pi is 1 if this window is a positive sample and 0 if a negative sample. Si is the assembled segment instance in this window, Si is the ground truth segment instance, and j is the pixel index in the window. L is the logistic regression loss.
256 sampled windows have a positive/negative sampling ratio of 1:1.

2. Instance-Sensitive Score Maps

2.1. Compared with FCN

In FCN (Top), when two persons are too close, the score map generated is difficult to make them separated.
However, using InstanceFCN (Bottom), each score map is responsible for capturing relative position of object instance. For example: the top-left score map is responsible for capturing top-left part of object instance. After assembling, a separated person mask can be generated.
Some examples of instance masks with k=3 as shown below:

**Some examples of instance masks with k=3**

2.2. Compared with DeepMask

In DeepMask, FC layers are used, which makes model large.
In InstanceFCN, there are no FC layers which makes model more compact.

3. Ablation Study

Average Recall (AR) is measured under 10, 100, 1000 proposals.
k=5 and k=7 are comparable. And k=5 in the following experiments.

~DeepMask: DeepMask implemented by authors. Using 2 FC layers requires 53M parameters. (512 × 14 × 14 × 512 + 512 × 56² = 53M)
It is found that using full-size images for training has the much higher AR. And the last k²-d convolutional layer has only 0.1M parameters. (512 × 3 × 3 × 25 = 0.1M)

4. Results

4.1. PASCAL VOC 2012

**Segment Proposals on PASCAL VOC 2012 Validation Set**

InstanceFCN is much better SS (Selective Search) and DeepMask.
InstanceFCN has higher AR than MNC at AR@10, and also comparable with MNC at AR@100 and AR@1000.