Transfer Learning using Mobilenet and Keras

Ferhat Culfaz
Towards Data Science
5 min readNov 6, 2018

--

In this notebook I shall show you an example of using Mobilenet to classify images of dogs. I will then show you an example when it subtly misclassifies an image of a blue tit. I will then retrain Mobilenet and employ transfer learning such that it can correctly classify the same input image. Only two classifiers are employed. But this can be extended to as many as you want, limited to the amount of hardware and time you have available.

The source paper for Mobilenet is located here: https://arxiv.org/pdf/1704.04861.pdf

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, Howard et al, 2017.

We shall be using Mobilenet as it is lightweight in its architecture. It uses depthwise separable convolutions which basically means it performs a single convolution on each colour channel rather than combining all three and flattening it. This has the effect of filtering the input channels. Or as the authors of the paper explain clearly: “ For MobileNets the depthwise convolution applies a single filter to each input channel. The pointwise convolution then applies a 1×1 convolution to combine the outputs the depthwise convolution. A standard convolution both filters and combines inputs into a new set of outputs in one step. The depthwise separable convolution splits this into two layers, a separate layer for filtering and a separate layer for combining. This factorization has the effect of drastically reducing computation and model size. ”

Difference between pointwise and depth wise convolutions

So the overall architecture of the Mobilenet is as follows, having 30 layers with

  1. convolutional layer with stride 2
  2. depthwise layer
  3. pointwise layer that doubles the number of channels
  4. depthwise layer with stride 2
  5. pointwise layer that doubles the number of channels

etc.

Mobilenet full architecture

It is also very low maintenance thus performing quite well with high speed. There are also many flavours of pre-trained models with the size of the network in memory and on disk being proportional to the number of parameters being used. The speed and power consumption of the network is proportional to the number of MACs (Multiply-Accumulates) which is a measure of the number of fused Multiplication and Addition operations.

Now lets get onto the code!

My code is located here: https://github.com/ferhat00/Deep-Learning/tree/master/Transfer%20Learning%20CNN

Lets load the necessary packages and libraries.

Lets input the pre-trained model from Keras.

Lets try some tests on images of different breed of dogs.

Photo by Jana Ohajdova on Unsplash

Output:

[[('n02106662', 'German_shepherd', 0.9796372),
('n02105162', 'malinois', 0.020184083),
('n02091467', 'Norwegian_elkhound', 0.00015799515),
('n02116738', 'African_hunting_dog', 5.2901587e-06),
('n02105251', 'briard', 3.9127376e-06)]]
Photo by Vincent van Zalinge on Unsplash

Output:

[[(‘n02099712’, ‘Labrador_retriever’, 0.73073703),
(‘n02087394’, ‘Rhodesian_ridgeback’, 0.03984367),
(‘n02092339’, ‘Weimaraner’, 0.03359009),
(‘n02109047’, ‘Great_Dane’, 0.028944707),
(‘n02110341’, ‘dalmatian’, 0.022403581)]]
Photo by Hans Ole Benonisen on Unsplash

Output:

[[('n02113799', 'standard_poodle', 0.5650911),
('n02113712', 'miniature_poodle', 0.37279922),
('n02102973', 'Irish_water_spaniel', 0.053150617),
('n02113624', 'toy_poodle', 0.0072146286),
('n02093859', 'Kerry_blue_terrier', 0.0013652634)]]

So far so good. It classifies pretty well each breed of dog. But lets try it on a type of bird, the blue tit.

Photo by Bob Brewer on Unsplash

Output:

[[('n01592084', 'chickadee', 0.95554715),
('n01530575', 'brambling', 0.012973112),
('n01828970', 'bee_eater', 0.012916375),
('n01532829', 'house_finch', 0.010978725),
('n01580077', 'jay', 0.0020677084)]]

You can see it could not recognise the blue tit. It mistakenly classified the image as a chickadee. This is a native bird to North America, and is subtlely different:

Photo by Patrice Bouchard on Unsplash

Lets now manipulate the Mobilenet architecture, and retrain the top few layers and employ transfer learning. To do this, we need to train it on some images. Here I will train it on Blue tits and Crows. But rather than manually downloading images of them, lets use Google Image Search and pull the images. To do this, there is a nice package we can import.

Check out https://github.com/hardikvasa/google-images-download

Lets now re-use MobileNet as it is quite lightweight (17Mb), freeze the base layers and lets add and train the top few layers. Note I shall only train two classifiers, the blue tit, and the crow.

Lets check the model architecture

We will use pre-trained weights as the model has been trained already on the Imagenet dataset. We ensure all the weights are non-trainable. We will only train the last few dense layers.

Now lets load the training data into the ImageDataGenerator. Specify path, and it automatically sends the data for training in batches, simplifying the code.

Compile the model. Now lets train it. Should take less than two minutes on a GTX1070 GPU.

Epoch 1/10
5/5 [==============================] - 5s 952ms/step - loss: 0.9098 - acc: 0.6562
Epoch 2/10
5/5 [==============================] - 3s 563ms/step - loss: 0.0503 - acc: 0.9686
Epoch 3/10
5/5 [==============================] - 3s 687ms/step - loss: 0.0236 - acc: 0.9930
Epoch 4/10
5/5 [==============================] - 4s 716ms/step - loss: 7.5358e-04 - acc: 1.0000
Epoch 5/10
5/5 [==============================] - 3s 522ms/step - loss: 0.0021 - acc: 1.0000
Epoch 6/10
5/5 [==============================] - 4s 780ms/step - loss: 0.0353 - acc: 0.9937
Epoch 7/10
5/5 [==============================] - 3s 654ms/step - loss: 0.0905 - acc: 0.9938
Epoch 8/10
5/5 [==============================] - 4s 890ms/step - loss: 0.0047 - acc: 1.0000
Epoch 9/10
5/5 [==============================] - 3s 649ms/step - loss: 0.0377 - acc: 0.9867
Epoch 10/10
5/5 [==============================] - 5s 929ms/step - loss: 0.0125 - acc: 1.0000

The model is now trained. Now lets test some independent input images to check the predictions.

Output:

array([[4.5191143e-15, 1.0000000e+00]], dtype=float32)

As you can see, it correctly predicts the crow, as the blue tit image is commented out.

Photo by Jaime Dantas on Unsplash

This can be further extended to more images, for a higher number of classifiers to generalise better but it is the most light method and quickest way to implement transfer learning for CNN’s. This of course depends on how fast, how accurate, and what hardware you want to implement your model on, as well as how much time you have available.

--

--