Deep CARs— Transfer Learning With Pytorch

Published in

Towards Data Science

10 min readJul 26, 2019

A step by step guide to completing Hackathon Auto-matic

Source: Honda Small Sports EV Concept Electric Car

How can you teach a computer to recognize different car brands? Would you like to take a picture of any car and your phone automatically tells you what the make of the car is?

If this excites you then you are in the right place. We are going to write a model that can recognize 196 different types of cars.

Side Note
Hackathon Automatic is the second project of an initiative to organize weekend hackathons started by 4 ladies and myself(from different parts of the world)
As part of the 5000 students selected for Secure and Private AI Scholarship Challenge on Udacity sponsored by facebook, we decided to organize weekend hackathons; 48hours of solving a problem with Pytorch, having fun and competing against each other. To our surprise, in our first hackathon, 41 teams participated 🙌 . Hackathon Blossom was the name given to the first hackathon. Hackathon Auto-matic which like Hackathon Blossom is also based on image classification.
I can go on and on about how incredible this opportunity has been and the amazing community we have there. I better stop here and go back to our goal for today :)

Getting Started

We would be using a neural network to accomplish our goal. To be more precise we will be using a very deep neural network hence the name deep cars.

This tutorial is divided into 2 parts:

Part 1: Building a car classifier

Part 2: Deploying a classifier(In progress…)

In this article, we would be going through Part 1

PART 1 : Building A Car Classifier

Prerequisite:

In order to follow up, some knowledge in the following is required:

Python — Udacity provides a great course on Introduction to Python
Convolutional Neural Networks — Adit provides a great explanation on CNNs
Basics of Pytorch

We will be using a method called Transfer Learning to train our classifier.

What is Transfer Learning?

Transfer Learning is a method in deep learning where a model that is developed to solve one task is reused as a starting point for another task. Say for example you want to build a network to identify birds, Rather than writing a model from scratch which can be a very complex task, to say the least, you can use an already existing model that was developed to do the same or similar task (in our case of recognizing birds we could use a network that recognizes other animals). The advantage of using transfer learning; the learning process is faster, more accurate and requires less training data. The already existing model is called a Pre-trained model.

Most pre-trained models used in transfer learning are based on large convolutional neural nets. Some people pre-trained models are VGGNet, ResNet, DenseNet, Google’s Inception, etc. Most of these networks are trained on ImageNet. ImageNet is a massive dataset with over 1 million labeled images in 1000 categories.

In Pytorch it is easy to load pre-trained networks based on ImageNet which are available from torchvision. We will use some of these pre-trained models to train our network.

Our model will be built on Google Colab using the following steps (Notebook can be found here):

Load data and Perform Transformations
Build the Model
Train the Model
Test the Model on Unseen Data

Import Libraries

Here we just load the libraries and make sure GPU is turned on. Since we will be using pre-trained models which are very deep networks, training on CPU is not really an option as it will take a really long time. The GPU performs linear algebra computations in parallel, hence the training speed increases 100x.

If your GPU is off and you are using Colab, on your notebook go to Edit => Notebook Settings. Make sure the Runtime is set to Python 3 and under Hardware Accelerator choose GPU.

You will notice that we are checking if cuda is available. Most deep learning frameworks use CUDA to compute the forward and backward passes on the GPU.

1. Perform Transformations and Load Dataset

1.1 Downloading The Dataset

Now that our libraries are imported we load our dataset from Kaggle. This dataset contains 196 car brands.

Here, we download the dataset and load them using Pytorch DataLoaders. We download the data directly into the google drive, hence we have to get authorized access.

#Mounting google drive inorder to access data
from google.colab import drive
drive.mount('/content/drive')

After running this: Click on the link that appears, log in to your account, click on allow, then copy and paste the generated text into your notebook. Check out this article which shows you how to get the API key and download the dataset easily. We add this line !unzip \*.zip to unzip the downloaded files. Your code should be something like this:

Notice we have 2 directories; the train and test directories. We will use our model to predict the values of a test set later. We have to split the train data into train and validation data. Before splitting, let us understand what transformations are and write our transformations.

1.2 Data Transformations

Now that the dataset is downloaded we perform transformations on the data. Transformation is converting the data from one form to another. We are going to apply 2 main transformations to our images:

Data Augmentation

This is a strategy to increase the diversity and size of a dataset for training without actually collecting new data. Techniques like resizing, cropping, horizontal flipping, padding and even GANs etc are applied to images on the dataset and “new” images are created. It has 2 main advantages; generates more data from limited data and prevents overfitting.

However, do not expect to see these generated images in the dataset. They are only created during batch generation, so the actual images during training will increase even though you do not see the number of images in the dataset increasing.

In our model, we apply 3 augmentation strategies; resizing (RandomResize), cropping(RandomCrop) and flipping horizontally (HorizontalFlip).

Note that for the test data, we do not do the RandomResizedCrop, RandomRotation and RandomHorizontalFlip transformations. Instead, we just resize the test images to 256×256 and crop out the center 224×224 in order to be able to use them with the pre-trained model.

Data Normalization

After performing augmentation the image is transformed into a tensor and normalized by using the mean and standard deviation of all images in ImageNet. Usually, for very large datasets the mean and standard deviation of the dataset itself is used. Given that our dataset is not too large we use those of ImageNet which are: [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]

After performing these transformations we load our data using ImageFolder from Pytorch. But first we need validation data, so we split the training set. Only 1% of our data is chosen for validation and the rest for training.

Visualizing Labels

We visualize our labels to see the structure of the file.

We see that one car name appears above 0. Hence we have to add a header name while reading our csv file and we get the right output. It is very important to note that our labels start from 0 to 195 (very important)

3 Visualize Images

We can now load and visualize our data. A method imshow() (from the challenge course)is created to display our images.

The images in the training set look like this. We notice that some of them have been flipped or rotated.

Images from train set after transformations

2. Building And Training The Model

As earlier mentioned we are going to be using pre-trained models based on ImageNet.

The steps we are going to use for building and training are:

Load the pre-trained model
Freeze parameters in convolutional layers
Create a custom classifier and define hyperparameters
Train the custom classifier

2.2 Loading the Pre-trained Model

We are going to try out different architectures; densenet161, inceptionv3, resnet121 and vggnet architecture. Here, we load the different models and specify the number of input features in the fully connected layer of the models because we will need this when building the custom classifier.

2.3 Freeze Parameters and Create Custom Classifier

Since most of the parameters in our pre-trained model are already trained for us, we don’t backprop through them. This will allow us to keep the pre-trained weights for early convolutional layers (whose purpose here is for feature extraction). We do this by resetting the requires_grad field to false.

After doing that we replace the fully connected network which will have the same inputs as our pretrained neuron, a custom hidden layer and our output. Our build_classifer method is flexible, it works when we want no hidden layers in our network and when we want more than one hidden layer. The activation function (in this case relu)and dropouts are also defined.

Now we specify our hyperparameters and hidden layers.

We specify the criterion, different optimizers like Adam, Adadelta, SGD which contain the learning rate and momentum. We play with these hyperparameters for the different pre-trained networks and choose the ones that give us the best results. We use 2 types of different schedulers for resnet and vggnet. This what they do:

torch.optim.lr_scheduler provides several methods to adjust the learning rate based on the number of epochs. torch.optim.lr_scheduler.ReduceLROnPlateau allows dynamic learning rate reducing based on some validation measurements. Read more here

2.4 Train And Validation

In order to train our model with PyTorch we generally perform the following steps while iterating through each epoch:

Make a forward pass through the network using forward(images)
Use the network output in the criterion function to calculate the loss
Perform a backward pass through the network with loss.backward() to calculate the gradients
Take a step with the optimizer to update the weights optimizer.step()

optimizer.zero_grad() is used to clear accumulated gradients

A technique called early stopping is used to prevent overfitting. It causes the training to stop when the performance on the validation dataset begins to fall. We also save the model(checkpoints) when we get the best accuracy as training goes on. This way checkpoints can be restored and training continued later if power is lost or training is disrupted due to some reason.

The model is adapted from the PyTorch Website

Now we train our model.

Epoch 1/60 
---------- 
train Loss: 0.5672 Acc: 0.8441 
valid Loss: 0.6750 Acc: 0.8329 
 Epoch 2/60 
---------- 
train Loss: 0.6184 Acc: 0.8357 
valid Loss: 0.5980 Acc: 0.8415  
Epoch 3/60 
---------- 
train Loss: 0.5695 Acc: 0.8487 
valid Loss: 0.5503 Acc: 0.8575  
...

This looks promising. The model appears to be learning with each epoch. Additionally, it does not appear that our model is overfitting (at least too much) since the training and validation metrics are not diverging too much. This model particular epoch results were gotten with the ResNet architecture and this was the second training. The accuracies started really low but improved with time. The hyperparameters that affected the accuracies we got a lot were the optimizer, the scheduler, the number of epochs and the architecture. Tweaking these values either gave very low accuracies (as low as 0 or even negative) or started with an accuracy like 0.013 which increased as the number of epochs increased (patience is key here).

4. Test the Model on Unseen Data

Once we feel comfortable with our validation accuracy we load our saved model and do predictions on the test data. The in-class competition required that we submit results in a csv file in the format Id, Predicted. Id here is the name of our image files without the extension .jpg and Predicted is the class of our model predicted for each image (should be between 1 and 196). Remember our labels start from 0 to 195, so we have to add 1 to our predicted classes to get the right values.

We load our saved model

model.load_state_dict(torch.load('/content/drive/MyDrive/ResnetCars.pt'))
model.to(device)

Now we load the test dataset and pass our model through the dataset. Since we are only doing predictions we do not need to compute gradients. We do this by torch.no_grad() and set to evaluation model.eval() . We calculate the predictions.

After getting our results we print the data frame and write our results into a .csv file which we would submit on the competition website.

Check out the amazing kernel of Khush Patel’s who emerged as the winner of the hackathon with a 99.18% accuracy. He used inceptionV3 architecture, with CrossEntropyLoss and an SGD optimizer. Can your model beat this? :)

You can join the inclass competition on Kaggle .

And we are done.

Congratulations 👏, It was a long post but you made it until the end. Now you can build your own models with transfer learning. The code is reusable and you can use it for other datasets as well.

Thanks for reading! Feel free to reach out any time on Twitter and LinkedIn.

References

[1] F. Zaidi, Transfer Learning in PyTorch, Part 1: How to Use DataLoaders and Build a Fully Connected Class (2019)

[2] G. Adjei, Transfer Learning with PyTorch, (2019), HeartBeat