While I study Deep Learning, I like to practice the theory by entering Kaggle competitions. Previously, I created a Convolutional Neural Network with TensorFlow to join the Cats vs. Dogs competition on Kaggle. This Convolutional Neural Network got me to place 833 on the Kaggle public leaderboard.
The accuracy of the CNN was around 90%. I want to try to improve the accuracy of my model and rise on the Kaggle leaderboard. I will use Transfer Learning to enhance my model. Transfer Learning is a technique where you reuse a model that someone else trained as your model’s starting point. TensorFlow Keras contains several built-in pre-trained models that you can use.
The source code is available on Github. I used Python 3.8 and Tensorflow 2.3.
Transfer Learning
Transfer learning is a popular technique to improve the accuracy of your model. You take a model that was trained by someone else for a different task and reuse it.
The first time I heard about transfer learning, I was confused. How can you reuse a model that someone else trained for a different purpose? I found out that the answer lies in the way a convolutional network works.
Remember that we had three convolutional layers in the TensorFlow model of the previous article? What happens during training is that TensorFlow trains each convolutional layer for specific features of images.
In 2013, Matt Zeiler and Rob Fergus published "Visualizing and Understanding Convolutional Networks." This paper showed how to visualize the weights of the neural network learned in each model layer. See below the visualization of the first layer and last layer of a CNN. The block on the top left is a visualization of the weights, while the rest show the layer’s activation during training.

If you look at the other layers of the CNN visualization, you see that each layer detects more specific features of images. With our CNN, the last layer’s visualization contains specific parts of a dog or cat. See below the visualization of Layer 5 from the Matt Zeiler report.

With Transfer Learning, we reuse the layers of the CNN model that are generic for all images. For example, this means that we keep the first 70 layers of an existing model and add and retrain new layers.
Now that you know what Transfer Learning, we will implement it using TensorFlow Keras.
Transfer Learning with TensorFlow Keras
TensorFlow Keras contains several built-in models that you can use for Transfer Learning. I will use the Inception V3 deep learning model to improve my model.
The Inception V3 model contains 159 layers and 23 million parameters. This is larger than our own CNN model, with nine layers and 9 million parameters. Hopefully, this will increase the accuracy of our model.
We will create the model differently than before. We are not using the Sequential
class, but we make and use an instance of the InceptionV3
model.
The input_shape
is still the same (150x150x3). To prevent retraining the existing layers, we lock all the layers by setting each layer’s trainable
property to False
.
The last row prints a summary of the model. You may have to scroll a lot because, as we said before, the model consists of 192 layers.
Adding trainable layers
We have to add additional layers to train the model using our cats and dogs images. We add our trainable layers after the 130th layer of the InceptionV3 model. This layer is called "mixed7", see the printed model summary.
We retrieve this layer using its name and then add our trainable layers.
We add a Flatten, hidden Dense, and a Dense output layer. We then create the model using the existing constructed layers_out
layers. From this part on, compiling and training the model is the same as before.
As you can see, I am still using ImageDataGenerators and Image Augmentation to feed the training images to the pipeline. See my previous article for more information.
We train the model for 100 epochs.
Resulting Training Accuracy and Validation Accuracy
We see that when we compare these results against our convolutional network results, we see that both the training and validation accuracy starts higher 0.9 vs. 0.6.

We also see that the validation accuracy starts higher than the training accuracy, but after 60 epochs, it starts to decrease. This pattern again looks like overfitting.


Creating and submitting a prediction
We see that if we create a prediction and submit it to Kaggle, our score is 0.18834. The best score using our CNN with Image Augmentation was 0.26211. So it got quite a bit better. This will get us in 762nd place on the public leaderboard. A jump of 71 places!

Optimizing the model
Can we optimize the model even further? One thing we can use is Dropout layers. The basic idea is to change some activations to zero at training time randomly. This can help to prevent overfitting.
The technique was first introduced by Nitish Srivastava et al. in "Dropout: A Simple Way to Prevent Neural Networks from Overfitting". The figure below shows visually what happens with a Dropout layer.

We add the Dropout layer just before our Dense layer. I set the amount of Dropout to 20% (0.2) on line nine.
The rest of the code stays as is. We again train the model for 100 epochs and visualize the accuracy and loss.


Both the training and validation accuracy follow the same trend which indicates that we are creating a good model and have no overfitting.
Creating and submitting a new prediction
We see that if we create a prediction and submit it to Kaggle, our score is now 0.14293. Which is quite an improvement compared to the model without the Dropout layer. This moves us up another 63 places to place 699 on the Kaggle public leaderboard.

There are many more pre-trained models available within the Keras framework.
Further optimization, choosing a different model
Until now, we used the InceptionV3 as our base model. As you may have seen, there are many Keras models that you can use for Transfer Learning. Another model that I want to try to improve our accuracy is DenseRes2.01
. DenseRes2.01
is a pre-trained Densely Connected Convolutional Network.
Research has shown that convolution networks can be more accurate if they contain shorter connections between layers close to the input and those close to the output. DensRes2.01 is such a densely connected convolutional network.
As these pre-trained models get bigger, the time to train an epoch also gets longer. I have been training the models constantly for about 100 epochs. Keras contains callback functions that automatically stop training if it detects that the accuracy is not increasing or decreasing. The callback function that I will be using is EarlyStopping
.
We set the EarlyStopping
callback as an argument to the model.fit
method. When creating EarlyStopping
, you give it an argument called patience
that sets the number of epochs with no improvement after TensorFlow stops the training. Sometimes the accuracy bounces slightly, and patience helps not top stop if the accuracy dropped after an epoch. See below an example.
Another optimization technique we are adding is adjusting the learning rate. The initial learning rate is given as an argument to the optimizer on the compile
method. Models can often be improved if you reduce the learning rate once the accuracy is not improving anymore.
The ReduceLROnPlateau
can automatically reduce the learning rate once it sees no improvement for several epochs. The ReduceLRonPlateau
is also a callback function given as an argument to the model.fit
method as with the EarlyStopping
.
You can see the complete creation and initialization of the DensRes2.01 model below.
We add the DenseNet201
model directly to Sequential
, and then we add the Flatten and a hidden Dense layer. Finally, we add a Dense layer with a single unit for the binary output.
We create the EarlyStopping
and ReduceLROnPlateau
instances on rows 15 and 17 and add them to the callbacks parameter on row 26.
Resulting Training Accuracy and Validation Accuracy
On my machine, this model trains ten epochs until it stops because of the EarlyStopping.
The training and validation accuracy are close to 0.99, while the loss is near 0.025.


Creating and submitting the last prediction
All that is left is to create the prediction and submit it to Kaggle. When we submit the predictions, we see that our score is 0.11625. Another improvement against the InceptionV3 model, although the comparison is not completely honest as we did not use the learning rate adjustment on Inception.

The score of 0.11625 gets us to place 624 on the public leaderboard, a jump of another 75 places.
Conclusion
This article described how to use Transfer Learning with TensorFlow and use it on the Kaggle Dogs vs. Cats competition. With Transfer Learning, you can reuse an existing model to improve the accuracy of your model. You lock most of the layers to prevent retraining and add your custom layers at the end.
We started with a Convolutional Neural Network, and then we added Image Augmentation to increase the number of training images. Next, we used Transfer Learning with the InceptionV3 model. After that, we used a Dropout layer with the InceptionV3 Model.
Finally, we reached our highest score using the DenseNet201 model combined with an automatic Learning rate adjustment.
Maybe if we add a Dropout layer to the last attempt, we can increase the score a bit more. An exercise that I will leave to you, my laptop needs to cool down.
You can find the source-code for this article on GitHub. The repository is large, as it includes all the training and test images.
Thank you for reading!