CIFAR 100: Transfer Learning using EfficientNet

Transfer learning using state-of-the-art EfficientNet-B0

Published in

Towards Data Science

10 min readMar 30, 2021

Convolutional Neural Network (CNN) is a class of deep neural networks commonly used to analyze images. In this article, we will together build a CNN model that can correctly recognize and classify colored images of objects into one of the 100 available classes of the CIFAR-100 dataset. In particular, we will reuse a state-of-the-art as the starting point for our model. This technique is called transfer learning. ➡️

Let us first understand what transfer learning is. I will not go into a lot of detail but will try to share some knowledge. 📝

Transfer Learning

As stated in the Handbook of Research on Machine Learning Applications, transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

In simple terms, transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. Deep learning networks are resource hungry and computationally expensive with millions of parameters. These networks are trained with a massive amount of data to avoid overfitting. Thus, when a state-of-the-art model is created it often takes researchers a lot of time in training. As a state-of-the-art model is trained after spending such a huge amount of resources, researchers thought that the benefits of such investments should be reaped many times and thus aroused the concept of transfer learning.

The best part of transfer learning is that we can either re-use the entire model or a certain part of it. Umm, intelligent! 😎 This way we don’t have to train the whole model. In particular, transfer learning saves time and gives better performance. For instance, using a pre-trained model which can recognize cars to now recognize trucks.

Let us now know a bit about the state-of-the-art model we will be using here.

EfficientNet-B0: State-of-the-art model

EfficientNet is a family of CNN's built by Google. ✌ ️These CNNs not only provide better accuracy but also improve the efficiency of the models by reducing the number of parameters as compared to the other state-of-the-art models. EfficientNet-B0 model is a simple mobile-size baseline architecture and trained on the ImageNet dataset.

While building a neural network, our basic approach to improve the model performance is to increase the number of units or the number of layers. However, this approach or strategy doesn’t work always or I must say doesn’t help after a point. For instance, I built a 9-layer convolutional neural network model for the CIFAR-100 dataset and managed to achieve an accuracy of just 59%. Just a little more than random chance. 😏 My attempts of increasing the number of layers or units didn’t further improve the accuracy. ☹️ (Link to code)

EfficientNet works on the idea that providing an effective compound scaling method (scaling all dimensions of depth/width/resolution) for increasing the model size can help the model achieve maximum accuracy gains. The figure below is from the original paper which gives a nice visualization of scaling.

**Source:** https://arxiv.org/pdf/1905.11946.pdf

Note: EfficientNet comes in a lot of variants. I used EfficientNet-B0 because it's a small model. If you want you can try out other variants of EfficientNet.

So, let’s build an image recognition model using EfficientNet-B0. Please note that I will just be training the model in the blog post. In case you want to know the pre-processing part, please refer to this blog post.

Note: I will try to make most of the concepts clear but still, this article assumes a basic understanding of the Convolutional Neural Network (CNN). 📖

The code for this task is available on my Github. Please feel free to use it to build a more intelligent image recognition system.

Model training using transfer learning

To train a machine learning model, we need a training set. A good practice is to keep a validation set to choose the hyperparameters and a test set to test the model on unseen data.

Let us first import the libraries.

from sklearn.model_selection import StratifiedShuffleSplit
import cv2
import albumentations as albu
from skimage.transform import resize
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from pylab import rcParams
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from keras.callbacks import Callback, EarlyStopping, ReduceLROnPlateau
import tensorflow as tf
import keras
from keras.models import Sequential, load_model
from keras.layers import Dropout, Dense, GlobalAveragePooling2D
from keras.optimizers import Adam
import efficientnet.keras as efn

I have used stratified shuffle split to split my training set into training and validation set because it will preserve the percentage of samples in each of the 100 classes. Here is the code to perform splitting.

sss = StratifiedShuffleSplit(n_splits=2, test_size=0.2, random_state=123)

for train_index, val_index in sss.split(X_train, y_train):
    X_train_data, X_val_data = X_train[train_index], X_train[val_index]
    y_train_data, y_val_data = y_train[train_index], y_train[val_index]

print("Number of training samples: ", X_train_data.shape[0])
print("Number of validation samples: ", X_val_data.shape[0])

The output gives the number of samples in each set.

Number of training samples:  40000 
Number of validation samples:  10000

As per EfficientNet, we need to not only scale the width and depth of the model (which will be taken care by the pre-trained model) but the resolution of the images as well. EfficientNet-B0 model architecture requires the image to be of size (224, 224). So, let us resize our images of size (32, 32) to the new size.

height = 224
width = 224
channels = 3input_shape = (height, width, channels)

The below function resize_img will take image and shape as the input and resize each image. I have used the bicubic interpolation method to upscale the images. It considers the closest 4 * 4 neighborhood of known pixels for a total of 16 pixels. This method produces noticeably sharper images and is considered an ideal combination of processing time and output quality.

def resize_img(img, shape):
    return cv2.resize(img, (shape[1], shape[0]), interpolation=cv2.INTER_CUBIC)

We all know that the performance of the deep learning model usually improves with the addition of more data, so I planned for image augmentation but memory is always a big constraint for deep learning models as they have a lot of trainable parameters. So, I opted for the albumentations library of python which helped in real-time data augmentation. (If you are not aware of this library, I strongly recommend you to look at its website and GitHub page.)

I created my own custom data generator class using the Keras data generator class. The parameters, horizontal flip, vertical flip, grid distortion, and elastic transformation were tuned to extend the dataset (you can try out other parameters too).

As the distribution of the feature values in the images can be very different from each other, the images are normalized by dividing each image by 255 as the range of each individual color is [0,255]. Thus, the rescaled images have all features in the new range [0,1].

I did all these transformations batch-wise. Also, I only applied augmentation to the training dataset and left the validation and the test dataset as it is.

Before writing the custom data generator class, let us first set our constants.

n_classes = 100
epochs = 15
batch_size = 8

Here is the code for the custom data generator class.

class DataGenerator(keras.utils.Sequence):
    def __init__(self, images, labels=None, mode='fit', batch_size=batch_size, dim=(height, width), channels=channels, n_classes=n_classes, shuffle=True, augment=False):
        
        #initializing the configuration of the generator
        self.images = images
        self.labels = labels
        self.mode = mode
        self.batch_size = batch_size
        self.dim = dim
        self.channels = channels
        self.n_classes = n_classes
        self.shuffle = shuffle
        self.augment = augment
        self.on_epoch_end()
   
    #method to be called after every epoch
    def on_epoch_end(self):
        self.indexes = np.arange(self.images.shape[0])
        if self.shuffle == True:
            np.random.shuffle(self.indexes)
    
    #return numbers of steps in an epoch using samples & batch size
    def __len__(self):
        return int(np.floor(len(self.images) / self.batch_size))
    
    #this method is called with the batch number as an argument to #obtain a given batch of data
    def __getitem__(self, index):
        #generate one batch of data
        #generate indexes of batch
        batch_indexes = self.indexes[index * self.batch_size:(index+1) * self.batch_size]
        
        #generate mini-batch of X
        X = np.empty((self.batch_size, *self.dim, self.channels))        for i, ID in enumerate(batch_indexes):
            #generate pre-processed image
            img = self.images[ID]
            #image rescaling
            img = img.astype(np.float32)/255.
            #resizing as per new dimensions
            img = resize_img(img, self.dim)
            X[i] = img
            
        #generate mini-batch of y
        if self.mode == 'fit':
            y = self.labels[batch_indexes]
            
            #augmentation on the training dataset
            if self.augment == True:
                X = self.__augment_batch(X)
            return X, y
        
        elif self.mode == 'predict':
            return X
        
        else:
            raise AttributeError("The mode should be set to either 'fit' or 'predict'.")
            
    #augmentation for one image
    def __random_transform(self, img):
        composition = albu.Compose([albu.HorizontalFlip(p=0.5),
                                   albu.VerticalFlip(p=0.5),
                                   albu.GridDistortion(p=0.2),
                                   albu.ElasticTransform(p=0.2)])
        return composition(image=img)['image']
    
    #augmentation for batch of images
    def __augment_batch(self, img_batch):
        for i in range(img_batch.shape[0]):
            img_batch[i] = self.__random_transform(img_batch[i])
        return img_batch

Let us apply the data generator class to our training and validation sets.

train_data_generator = DataGenerator(X_train_data, y_train_data, augment=True) 
valid_data_generator = DataGenerator(X_val_data, y_val_data, augment=False)

The EfficientNet class is available in Keras to help in transfer learning with ease. I used the EfficientNet-B0 class with ImageNet weights. Since I used this model just for feature extraction, I did not include the fully-connected layer at the top of the network instead specified the input shape and pooling. I also added my own pooling and dense layers.

Here is the code to use the pre-trained EfficientNet-B0 model.

efnb0 = efn.EfficientNetB0(weights='imagenet', include_top=False, input_shape=input_shape, classes=n_classes)

model = Sequential()
model.add(efnb0)
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

model.summary()

Here is the output.

The model has 4,135,648 trainable parameters. 😳

optimizer = Adam(lr=0.0001)

#early stopping to monitor the validation loss and avoid overfitting
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10, restore_best_weights=True)

#reducing learning rate on plateau
rlrop = ReduceLROnPlateau(monitor='val_loss', mode='min', patience= 5, factor= 0.5, min_lr= 1e-6, verbose=1)#model compiling
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

After compiling our model, let us fit it on our training dataset and validate it on the validation dataset.

model_history = model.fit_generator(train_data_generator, validation_data = valid_data_generator, callbacks = [early_stop, rlrop],verbose = 1, epochs = epochs)

#saving the trained model weights as data file in .h5 format
model.save_weights("cifar_efficientnetb0_weights.h5")

Here are the snippets of training.

We can see that the model adjusted the learning rate on the 14th epoch and we get a final accuracy of 84.82% on the training set, which is pretty good. But wait, we need to look at the test accuracy too.

Visualization helps to see things better. So, let’s plot the accuracy and loss plots.

#plot to visualize the loss and accuracy against number of epochs
plt.figure(figsize=(18,8))

plt.suptitle('Loss and Accuracy Plots', fontsize=18)

plt.subplot(1,2,1)
plt.plot(model_history.history['loss'], label='Training Loss')
plt.plot(model_history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.xlabel('Number of epochs', fontsize=15)
plt.ylabel('Loss', fontsize=15)

plt.subplot(1,2,2)
plt.plot(model_history.history['accuracy'], label='Train Accuracy')
plt.plot(model_history.history['val_accuracy'], label='Validation Accuracy')
plt.legend()
plt.xlabel('Number of epochs', fontsize=14)
plt.ylabel('Accuracy', fontsize=14)
plt.show()

Let us evaluate our model now.

valid_loss, valid_accuracy = model.evaluate_generator(generator = valid_data_generator, verbose = 1)

print('Validation Accuracy: ', round((valid_accuracy * 100), 2), "%")

Output:

1250/1250 [==============================] - 85s 68ms/step Validation Accuracy:  82.3 %

Now, it's time to look at the test dataset accuracy.

y_pred = model.predict_generator(DataGenerator(X_test, mode='predict', augment=False, shuffle=False), verbose=1)
y_pred = np.argmax(y_pred, axis=1)
test_accuracy = accuracy_score(np.argmax(y_test, axis=1), y_pred)

print('Test Accuracy: ', round((test_accuracy * 100), 2), "%")

Output:

1250/1250 [==============================] - 78s 63ms/step
Test Accuracy:  81.79 %

The results of the training are pretty good. We got an accuracy of 81.79% on the test dataset. 💃

Confusion matrix and classification reports can be generated for the model using the following code.

cm = confusion_matrix(np.argmax(y_test, axis=1), y_pred)
print(cm)target = ["Category {}".format(i) for i in range(n_classes)]
print(classification_report(np.argmax(y_test, axis=1), y_pred, target_names=target))

Here is a code snippet for the first 11 classes.

From the classification report, we can see that a few categories have been predicted well whereas a few have been incorrectly predicted.

If you want to visualize the predictions, here is the code.

prediction = pd.DataFrame(y_pred)rcParams['figure.figsize'] = 12,15

num_row = 4
num_col = 4

imageId = np.random.randint(0, len(X_test), num_row * num_col)

fig, axes = plt.subplots(num_row, num_col)

for i in range(0, num_row):
    for j in range(0, num_col):
        k = (i*num_col)+j
        axes[i,j].imshow(X_test[imageId[k]])
        axes[i,j].set_title("True: " + str(subCategory.iloc[testData['fine_labels'][imageId[k]]][0]).capitalize() + "\nPredicted: " + str(subCategory.iloc[prediction.iloc[imageId[k]]]).split()[2].capitalize(), fontsize=14)
        axes[i,j].axis('off')
        fig.suptitle("Images with True and Predicted Labels", fontsize=18) 

plt.show()

Here is the snippet of the output.

You can see that our model got confused between a motorcycle and a bicycle. 🙄 But, we can see that most of the predictions were correct. ✅

Deep learning is all about experimentation. It is quite possible that the performance of this model can be improved further using the other state-of-the-art versions of EfficientNet. Hyperparameter tuning is also an important aspect of deep learning and can help in increasing accuracy.

I hope the blog helped you in understanding how to perform transfer learning. Please feel free to experiment more to get better performance. Check out my GitHub for the complete code and my previous article for the initial steps. Also, I highly recommend you to read the original paper. It is an interesting read!

Related article:

CIFAR-100: Pre-processing for image recognition task

Pre-processing or data preparation of a popular image dataset — CIFAR-100

towardsdatascience.com

Reference:

Original paper: https://arxiv.org/pdf/1905.11946.pdf
This Kaggle notebook gave me direction on how to perform transfer learning.

Thank you, everyone, for reading this. Do share your valuable feedback or suggestion. Happy reading! 📗 🖌 Also, I would love to know if you get better performance on CIFAR-100 using transfer learning.