Review: SegNet (Semantic Segmentation)

Encoder Decoder Architecture, Using Max Pooling Indices to Upsample, Outperforms FCN, DeepLabv1, DeconvNet

Sik-Ho Tsang

Follow

Published in

Towards Data Science

4 min readFeb 10, 2019

--

**SegNet by Authors (**https://www.youtube.com/watch?v=CxanE_W46ts)

In this story, SegNet, by University of Cambridge, is briefly reviewed. Originally, it was submitted to 2015 CVPR, but at last it is not being published in CVPR (But it’s 2015 arXiv tech report version and still got over 100 citations). Instead, it is published in 2017 TPAMI with more than 1800 citations. And right now the first author has become the Director of Deep Learning and AI in Magic Leap Inc. (Sik-Ho Tsang @ Medium)

Below is the demo from authors:

SegNet by Authors (https://www.youtube.com/watch?v=CxanE_W46ts)

There is also an interesting demo that we can choose a random image or even upload our own image to try the SegNet. I have tried as below:

http://mi.eng.cam.ac.uk/projects/segnet/demo.php

**The segmentation result for a road scene image that I found from internet**

Outline

Encoder Decoder Architecture
Differences from DeconvNet and U-Net
Results

1. Encoder Decoder Architecture

**SegNet: Encoder Decoder Architecture**

SegNet has an encoder network and a corresponding decoder network, followed by a final pixelwise classification layer.

1.1. Encoder

At the encoder, convolutions and max pooling are performed.
There are 13 convolutional layers from VGG-16. (The original fully connected layers are discarded.)
While doing 2×2 max pooling, the corresponding max pooling indices (locations) are stored.

1.2. Decoder

**Upsampling Using Max-Pooling Indices**

At the decoder, upsampling and convolutions are performed. At the end, there is softmax classifier for each pixel.
During upsampling, the max pooling indices at the corresponding encoder layer are recalled to upsample as shown above.
Finally, a K-class softmax classifier is used to predict the class for each pixel.

2. Differences from DeconvNet and U-Net

DeconvNet and U-Net have similar structures as SegNet.

2.1. Differences from DeconvNet

Similar upsampling approach called unpooling is used.
However, there are fully-connected layers which make the model larger.

2.2. Differences from U-Net

It is used for biomedical image segmentation.
Instead of using pooling indices, the entire feature maps are transfer from encoder to decoder, then with concatenation to perform convolution.
This makes the model larger and need more memory.

3. Results

Two datasets are tried. One is CamVid dataset for Road Scene Segmentation. One is SUN RGB-D dataset for Indoor Scene Segmentation.

3.1. CamVid dataset for Road Scene Segmentation

**Compared With Conventional Approaches on CamVid dataset for Road Scene Segmentation**

As shown above, SegNet obtains very good results for many classes. It also got the highest class average and global average.

**Compared With Deep Learning Approaches on CamVid dataset for Road Scene Segmentation**

SegNet obtains highest global average accuracy (G), class average accuracy (C), mIOU and Boundary F1-measure (BF). It outperforms FCN, DeepLabv1 and DeconvNet.

3.2. SUN RGB-D Dataset for Indoor Scene Segmentation

Only RGB is used, depth (D) information are not used.

**Compared With Deep Learning Approaches on SUN RGB-D Dataset for Indoor Scene Segmentation**

Again, SegNet outperforms FCN, DeconvNet, and DeepLabv1.
SegNet only got a bit inferior to DeepLabv1 for mIOU.

**Class Average Accuracy for Different Classes**

Higher accuracy for large-size classes.
Lower accuracy for small-size classes.

3.3. Memory and Inference Time

SegNet is slower than FCN and DeepLabv1 because SegNet contains the decoder architecture. And it is faster than DeconvNet because it does not have fully connected layers.
And SegNet has low memory requirement during both training and testing. And the model size is much smaller than FCN and DeconvNet.

References

[2015 arXiv] [SegNet]
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

[2017 TPAMI] [SegNet]
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

My Previous Reviews

Image Classification
[LeNet] [AlexNet] [ZFNet] [VGGNet] [SPPNet] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet]

Object Detection
[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [DeepID-Net] [R-FCN] [ION] [MultiPathNet] [NoC] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation
[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [ParseNet] [DilatedNet] [PSPNet] [DeepLabv3]

Biomedical Image Segmentation
[CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet]

Instance Segmentation
[DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution
[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN]