Eye in the Sky — Image Classification using Transfer Learning and Data Augmentation

Fast.AI Deep Learnings Part I

Murali M K
Towards Data Science

--

“Side shot of a fighter jet speeing through the air.” by Dan on Unsplash

Becoming a fighter pilot was my childhood dream, and watching planes until they clear my sight is still an excitement. When I was thinking about a data set to start applying my learnings on Computer Vision from fast.ai, that was my motivation to play with them.

This is the first part of my series Fast.AI Deep Learnings. For those who don’t know, fast.ai is an online course on Deep Learning taught by Jeremey Howard, former Kaggle President and #1 competitor. I felt his top-down approach as a perfect complementary after I finished Andrew NG’s Deep Learning Specialization on Coursera.

I am going to give an end-to-end implementation of planes vs. helicopters classification, and let’s see how the CNN architecture differentiates wings vs. blades. The complete code for execution can be found here on github.

1. Data Preparation

I thought of not using any existing datasets, so I downloaded images from google myself. I have used Firefox’s Google Image Downloader, which is a convenient extension tool to download images. I also found this python package useful to automate this process for multiple tags. With a total of 216 planes and 350 helicopters, let’s look at the sample pictures in the dataset.

Our dataset looks diverse enough (at least the airplanes) with pictures ranging from regular passenger flights, propellers, fighters, toy planes, combat helicopters, and gigantic house movers. Let’s see how our NN tackles them.

2. Quick Model with ResNet34

We will be using ResNet34 pre-trained model, which won the 2015 ImageNet competition. Since ImageNet contains a diversified clicked pictures, this architecture will be suited well for this problem. You can go through this excellent article to decode the ResNet architecture, and the underlying philosophy is to use a deeper network without experiencing degradation in convergence.

“ You should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting ” — CS231n

With 5 epochs, batch size of 8, and learning rate of 0.01, we are able to achieve an accuracy of around ~90%.

Accuracy after training for five epochs

2a. Analyzing Results

Most correctly classified planes
Most correctly classified helicopters
Most incorrectly classified planes
Most incorrectly classified helicopters
Most uncertain predictions

Although there are planes that are difficult to identify (the one on fire and the ‘Heathrow’ one), the incorrectly classified helicopters seem to be not that different from the correct ones.

3. Learning Rate Finder

The learning rate determines how quickly or how slowly you want to update the weights. Although for a small dataset (~500 images) like ours, it wouldn’t matter much, but learning rate significantly affects model performance/convergence.

We are going to use the technique developed in the paper, Cyclical Learning Rates for Training Neural Networks, where we simply keep increasing the learning rate from a minimal value, until the loss stops decreasing. We are going to use LR of 1e-2 based on the plots below.

Left: Increasing learning rate every iteration. Right: Identifying the learning rate where there is still a decreasing loss

4. Data Augmentation

If we train the model for more epochs, it will start over-fitting (training loss becomes less, whereas validation loss doesn’t). The model will start learning features specific to these data points rather than generalizing. To overcome this, we are going to create more data using Data Augmentation. For this problem, we will be using horizontal flipping and zooming (up to 1.1x). Make sure you do relevant augmentation — For letters/digits, any flipping doesn’t make sense, whereas for satellite images, we can flip and rotate the images as we want.

Augmenting through horizontal flipping and zooming

We can see that the accuracy has improved to 92.3% after passing this augmented data. Note that we are still using the pre-trained weights and only training the final FC layers.

Accuracy after 5 epochs with augmentation

5. Fine Tuning along with Differential Learning Rate Annealing

Learning rate Annealing: Though we found that 1e-2 as the ideal learning rate, we have actually started with that and gradually decreased it as the training progresses. This is because, as we get closer to optimal weights, we should be taking smaller steps.

Stochastic Gradient Descent with Restarts: Using the above technique for one epoch cycle, we would restart the learning rate again from 10e-2 and continue the process. The idea behind this is to encourage our model to jump out of this optimal space and find (if any) a stable and accurate optimal space. Combining these two, the learning rate progresses like this —

SGDR: Each cycle corresponds to one epoch

Unfreezing pre-trained layers: Now, since we have trained enough our final layers, we will unfreeze the pre-trained ResNet layers and fine-tune them. Since these layers have already been trained on imageNet photos and the initial layers will have more general-purpose features, we would be using differential learning rate so as to tune them carefully. So, initial layers will use 1e-4, middles ones 1e-3 and our FC layers 1e-2 as before.

Final Accuracy after 3 cycles with 1 epoch, 2 epoch and 4 epoch each

We see that the accuracy didn’t change much but improved to 93%. We still see that the validation loss is 0.21 whereas training loss is at 0.12, and I feel there’s still room to improve the model, but we will stop here for now. I think that there are more variety of planes but not enough examples to learn those features (that’s why I think we see a lot of planes being misclassified below).

Test Time Augmentation: This is an excellent approach to use the power of augmentation while validating/testing. TTA makes predictions not just on the images in your validation set, but also makes predictions on several randomly augmented versions of them too. It then takes the average prediction from these images, and uses that. We achieved an accuracy of 94.4%, a pretty good jump considering this simple yet effective technique.

6. Analyzing Final Results

Looking at the confusion matrix, there are 7 planes which are predicted as helicopters (these seem to be the incorrect ones after the initial model itself), and we did a good job at predicting helicopters. Maybe with more images of airplanes we might be reaching higher accuracy.

Final most incorrect aeroplanes
Final incorrect helicopter

Thank you so much for reading untill the end. We have used transfer learning, data augmentation, and differential learning rate annealing to classify planes vs. helicopters to a very decent accuracy of 94.4% with a small dataset (~500 images). Please clap and share if you like and also let me know in the comments if anything is unclear or can be improved upon.

Code: https://github.com/murali-munna/deep-learning-fastai

To know practical tips for beginner in Data Science, check out this post.

Connect with me on LinkedIn

--

--

Data Science | Machine Learning | Search and Recommender Systems | Piano |Classical Music | Psychology | Cinema