Review: InstanceFCN — Instance-Sensitive Score Maps (Instance Segmentation)
Fully Convolutional Network (FCN), With Instance-Sensitive Score Maps, Better than DeepMask, Competitive with MNC
In this story, InstanceFCN (Instance-sensitive Fully Convolutional Networks), by Microsoft Research, Tsinghua University, and University of Science and Technology of China, is shortly reviewed.
By using Fully Convolutional Network (FCN), Instance-Sensitive Score Maps are introduced and all Fully Connected (FC) layers are removed. Competitive results of instance segment proposal on both PASCAL VOC and MS COCO are obtained. It is published in 2016 ECCV with more than 100 citations. (Sik-Ho Tsang @ Medium)
What Are Covered
- Network Structure
- Instance-Sensitive Score Maps
- Ablation Study
- Results
1. Network Structure
- VGG-16 pretrained on ImageNet is used as feature extractor. Max pooling layer pool4 is modified from stride 2 to stride 1. Accordingly conv5_1 to conv5_3 are adjusted by “hole algorithm”, which was used by DeepLab & DilatedNet before, in order to decrease the output stride, i.e. increase the output feature map size.
- On top of the feature map, there are two fully convolutional branches, one for estimating segment instances and the other for scoring the instances.
Instance-sensitive score maps branch
- For the first branch (top path), we adopt a 1×1 512-d convolutional layer to transform the features, and then use a 3×3 convolutional layer to generate a set of k² instance-sensitive score maps, which is k² output channels. (k=5 finally.)
- An assembling module is used to generate object instances in a sliding window of a resolution m×m. (m=21 here.)
- The idea is very similar to that of positive-sensitive score maps in R-FCN. But R-FCN uses positive-sensitive score maps for object detection while InstanceFCN uses instance-sensitive score maps for generating proposals.
Objectness score map branch
- For the second branch of scoring instances (bottom path), we use a 3×3 512-d convolutional layer followed by a 1×1 convolutional layer. This 1×1 layer is a per-pixel logistic regression for classifying instance/not-instance of the sliding window centered at this pixel. Thus, it is a objectness score map.
Loss function
- Here i is the index of a sampled window, pi is the predicted objectness score of the instance in this window, and pi is 1 if this window is a positive sample and 0 if a negative sample. Si is the assembled segment instance in this window, Si is the ground truth segment instance, and j is the pixel index in the window. L is the logistic regression loss.
- 256 sampled windows have a positive/negative sampling ratio of 1:1.
2. Instance-Sensitive Score Maps
2.1. Compared with FCN
- In FCN (Top), when two persons are too close, the score map generated is difficult to make them separated.
- However, using InstanceFCN (Bottom), each score map is responsible for capturing relative position of object instance. For example: the top-left score map is responsible for capturing top-left part of object instance. After assembling, a separated person mask can be generated.
- Some examples of instance masks with k=3 as shown below:
2.2. Compared with DeepMask
- In DeepMask, FC layers are used, which makes model large.
- In InstanceFCN, there are no FC layers which makes model more compact.
3. Ablation Study
- Average Recall (AR) is measured under 10, 100, 1000 proposals.
- k=5 and k=7 are comparable. And k=5 in the following experiments.
- ~DeepMask: DeepMask implemented by authors. Using 2 FC layers requires 53M parameters. (512 × 14 × 14 × 512 + 512 × 56² = 53M)
- It is found that using full-size images for training has the much higher AR. And the last k²-d convolutional layer has only 0.1M parameters. (512 × 3 × 3 × 25 = 0.1M)
4. Results
4.1. PASCAL VOC 2012
- InstanceFCN has higher ARs than DeepMask and DeepMaskZoom.
References
[2016 ECCV] [InstanceFCN]
Instance-sensitive Fully Convolutional Networks
My Related Reviews
Image Classification
[LeNet] [AlexNet] [ZFNet] [VGGNet] [SPPNet] [PReLU-Net] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet]
Object Detection
[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [DeepID-Net] [R-FCN] [ION] [MultiPath] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000]
Semantic Segmentation
[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [ParseNet] [DilatedNet] [PSPNet]
Instance Segmentation
[DeepMask] [SharpMask] [MultiPath] [MNC]