Review: NIN — Network In Network (Image Classification)

Using Convolution Layers With 1×1 Convolution Kernels

Sik-Ho Tsang
Towards Data Science

--

A few example images from the CIFAR10 dataset.

In this story, Network In Network (NIN), by Graduate School for Integrative Sciences and Engineering and National University of Singapore, is briefly reviewed. Micro neural networks with more than complex structures to abstract the data within the receptive field. This is a 2014 ICLR paper with more than 2300 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Linear Convolutional Layer VS mlpconv Layer
  2. Fully Connected Layer VS Global Average Pooling Layer
  3. Overall Structure of Network In Network (NIN)
  4. Results

1. Linear Convolutional Layer VS mlpconv Layer

1.1. Linear Convolutional Layer

Linear Convolutional Layer
  • Here (i, j) is the pixel index in the feature map, xij stands for the input patch centered at location (i, j), and k is used to index the channels of the feature map.
  • However, representations that achieve good abstraction are generally highly nonlinear functions of the input data.
  • Authors argue that it would be beneficial to do a better abstraction on each local patch, before combining them into higher level concepts.

1.2. mlpconv Layer

mlpconv Layer
  • n is the number of layers in the multilayer perceptron. Rectified linear unit is used as the activation function in the multilayer perceptron.
  • The above structure allows complex and learnable interactions of cross channel information.
  • It is equivalent to a convolution layer with 1×1 convolution kernel.

2. Fully Connected Layer VS Global Average Pooling Layer

An Example of Fully Connected Layer VS Global Average Pooling Layer

2.1. Fully Connected Layer

  • Usually, fully connected layers are used at the end of network.
  • However, they are prone to overfitting.

2.2. Global Average Pooling Layer

  • Here, global average pooling is introduced.
  • The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer.
  • One advantage is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories.
  • Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer.
  • Furthermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.

3. Overall Structure of Network In Network (NIN)

Overall Structure of Network In Network (NIN)
  • Thus, the above is the overall structure of NIN.
  • With global average pooling at the end.

4. Results

4.1. CIFAR-10

Error Rates on CIFAR-10 Test Set
  • NIN + Dropout got only 10.41% error rate is better than Maxout + Dropout.
  • With data augmentation (Translation & Horizontal Flipping), NIN even got 8.81% error rate.
  • (If interested, there is a very short introduction of Maxout in NoC.)
  • As shown above, introducing dropout layers in between the mlpconv layers reduced the test error by more than 20%.

4.1. CIFAR-100

Error Rates on CIFAR-100 Test Set
  • Similarly, NIN + Dropout got only 35.68% error rate which is better than Maxout + Dropout.

4.3. Street View House Numbers (SVHN)

Error Rates on SVHN Test Set
  • However, NIN + Dropout got 2.35% error rate which is worse than DropConnect.

4.4. MNIST

Error Rates on MNIST Test Set
  • In MNIST, NIN + Dropout got 0.47% error rate which is worse than Maxout + Dropout a bit.

4.5. Global Average Pooling as a Regularizer

Error Rates on CIFAR-10 Test Set
  • With Global Average Pooling, NIN got 10.41% error rate which is better than fully connected + dropout of 10.88%.

In NIN, with 1×1 convolution, more non-linearity is introduced which makes the error rate lower.

--

--