Transfer Learning using PyTorch — Part 2

Vishnu Subramanian
Towards Data Science
6 min readApr 19, 2017

--

In the previous blog we discussed how Neural networks use transfer learning for various computer vision tasks .In this blog we will look into the following.

  1. VGG Architecture
  2. Fine tune VGG using pre-convoluted features
  3. Accuracy
  4. Performance comparison between PyTorch and Keras on Tensorflow

VGG Architecture :

One of the most studied Deep learning models for transfer learning is VGG. We will go through a high level overview of VGG to understand how it can be optimally used in transfer learning.

VGG model can be split into two kinds of logical blocks

  1. Convolution blocks:

The pre-trained VGG model is trained on Image net data set over 1000 categories. The convolutional block contains multiple convolution layers . The initial layers contain low level features like lines , curves . The last convolutional layers in this block contain more complex kind of features of images like hand, leg , eyes and many more.The below image captures what kind of features are captured in different layers.

As you can see from the above images , the features being captured by the convolution layers of a pre-trained model can be used across most kind of image problems. The above features may not work for problems like cartoon animations , medical images since they need completely different features.

The convolution layers exhibit 2 important properties -

  1. The number of parameters required is far less compared to fully connected layer. For example a Convolution layer with 3 * 3 * 64 size filters need only 576 parameters.
  2. Convolution layers are computationally expensive and take longer to compute the output.

2. Fully Connected Block:

This block contains Dense(in Keras) / Linear(in PyTorch) layers with dropouts. The number of parameters to learn in FC layers are huge but takes way less time to compute.

So, we generally end up taking pre convoluted features from Convolution block of VGG model as it is and training only the last few layers of the VGG model which are generally from fully connected block.

Fine tune VGG using pre convoluted features :

As we know that convolution layers are expensive to calculate , it makes sense to compute the output of the convolution layers once and use them to train the fully connected layer . This approach speeds up the process of training new models using transfer learning. For example , if it takes 3 minutes to train the model for 1 iteration , by pre computing the convolutional layer output(say it takes 2 min) and for the rest of the iterations in FC block only takes few seconds per iteration.

Pre convoluted features :

I am using dogs and cats dataset from kaggle , which contains 25000 images .I have kept 2000 images for validation and the remaining 23000 for training.

Hardware benchmarked on :

I am using an Intel i7 processor , 64 gb ram and Titan X GPU for the mentioned experiments.

My Monster playing with NN weights :)

To compute the convolutional features ,we pass all the images through the convolutional layers . Luckily pytorch implements VGG as 2 logical blocks consisting of features(Convolutional block) and classifier block (FC). In the below code we will use the features block to compute convolutional layers output. We store it into a bcolz array for further processing. Bcolz array provides a compressed and faster way to process array.

model_vgg = models.vgg16(pretrained=True)
for param in model_vgg.parameters():
param.requires_grad = False
def preconvfeat(dataset):
conv_features = []
labels_list = []
for data in dataset:
inputs,labels = data
inputs , labels = Variable(inputs.cuda()),Variable(labels.cuda())
x = model_vgg.features(inputs)
conv_features.extend(x.data.cpu().numpy())
labels_list.extend(labels.data.cpu().numpy())
conv_features = np.concatenate([[feat] for feat in conv_features])
return (conv_features,labels_list)

It took me 1 minute 8 seconds to calculate the features for training dataset of 23000 images . The approximate size is 600 mb.

Fine Tuning :

We can use the processed features to train the fully connected layers. For 10 iterations it took 25 seconds.Code snippet for using VGG classifier block.

for param in model_vgg.classifier[6].parameters():
param.requires_grad = True
train_model(model=model_vgg.classifier,size=dset_sizes['train'],conv_feat=conv_feat_train,labels=labels_train,epochs=10,optimizer=optimizer,train=True,shuffle=True)

Accuracy :

After running the model for approximately 30 epochs , that is in less than few minutes , the validation accuracy reached to 97% . The accuracy can be improved further by adding Batch normalization and decreasing dropout values. In this notebook I have included the other experiments that I have tried.

Performance comparison between PyTorch VGG and Keras on Tensorflow VGG:

I have been using Keras on Tensorflow for quite sometime. So I was eager to see how both of them perform in regards with time . I am using PyTorch version 0.1.11 and Tensorflow version 1.0.1 for the experiment.

Experiment Conducted : Ran pretrained VGG model with convolution layer weights unchanged. Did not calculate any convolution features. Ran on the 23000 images for 10 epochs. And below is my result.

PyTorch — 15 min 19s

Keras on Tensoflow — 31min 29s

PyTorch has data loaders which can use multiple threads at a time to load the data. When 6 threads are used the performance of the VGG model improves to 11 min.

Update : Based on the below tweet , I have tried using keras with 6 workers for pre processing and the performance for each epoch improved to 1 min 40 seconds from 3 min 21 seconds. PyTorch for one epoch took 1 min 11 seconds using 6 workers.

Conclusion: Few observations that I made when using Keras and pytorch . Keras is too abstract , good to start , build standard models quickly . Performance is expected to be better than PyTorch , but it does not look that way , though there are lot of improvements lined up with TensorFlow. On the other hand PyTorch provides similar API to Python NumPy along with ability to operate on GPU. The learning curve is a lot lesser than tensorflow and lot flexible than Keras. So if you are passionate about Deep Learning then you should definitely take a look at PyTorch.

Apart from frameworks, we discussed how we can use pre-convoluted features to train the model lot faster.In fact Jeremy Howard in his part 2 course , which will be soon available as MOOC discusses an interesting approach of how Facebook uses these pre convoluted features. The approach states that “When different teams are working on the same data set , it makes a lot of sense to calculate these Convolution features and make it available for all the teams.” . This approach will save a lot of time while building models. In fact Facebook approaches the problem on similar lines.

You can find the code associated for the experiments here.

You can find me on LinkedIn

--

--