How to train your neural net

PyTorch [Vision] — Binary Image Classification

This notebook takes you through the implementation of binary image classification with CNNs using the hot-dog/not-dog dataset on PyTorch.

Akshaj Verma

Published in

Towards Data Science

12 min readApr 24, 2020

Import Libraries

import numpy as np
import pandas as pd
import seaborn as sns
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms, utils, datasets
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
from sklearn.metrics import classification_report, confusion_matrix

Set the random seed.

np.random.seed(0)
torch.manual_seed(0)

Set Seaborn style.

%matplotlib inline
sns.set_style('darkgrid')

Define Paths and Set GPU

Let’s define the path for our data.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("We're using =>", device)root_dir = "../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/"
print("The data lies here =>", root_dir)
###################### OUTPUT ######################We're using => cuda
The data lies here => ../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/

Define transforms

Let’s define a dictionary to hold the image transformations for train/test sets. We will resize all images to have size (224, 224) as well as convert the images to tensor.

The ToTensor operation in PyTorch convert all tensors to lie between (0, 1).

ToTensor converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

image_transforms = {
    "train": transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
    ]),
    "test": transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
}

Initialize Datasets

Train + Validation Dataset

We 2 dataset folders with us — Train and Test.

We will further divide our Train set as Train + Val.

hotdog_dataset = datasets.ImageFolder(root = root_dir + "train",
                                      transform = image_transforms["train"]
                                     )hotdog_dataset
###################### OUTPUT ######################Dataset ImageFolder
    Number of datapoints: 498
    Root location: ../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/train
    StandardTransform
Transform: Compose(
               Resize(size=(224, 224), interpolation=PIL.Image.BILINEAR)
               ToTensor()
           )

Class <=> ID Mapping of Output

The class_to_idx function is pre-built in PyTorch. It returns class ID's present in the dataset.

hotdog_dataset.class_to_idx
###################### OUTPUT ######################{'hot_dog': 0, 'not_hot_dog': 1}

We will now construct a reverse of this dictionary; a mapping of ID to class.

idx2class = {v: k for k, v in hotdog_dataset.class_to_idx.items()}

Let’s also write a function that takes in a dataset object and returns a dictionary that contains the count of class samples. We will use this dictionary to construct plots and observe the class distribution in our data.

get_class_distribution() takes in an argument called dataset_obj.

We first initialize a count_dict dictionary where counts of all classes are initialized to 0.
Then, let’s iterate through the dataset and increment the counter by 1 for every class label encountered in the loop.

plot_from_dict() takes in 3 arguments: a dictionary called dict_obj, plot_title, and **kwargs. We pass in **kwargs because later on, we will construct subplots which require passing the ax argument in Seaborn.

First convert the dictionary to a data-frame.
Melt the data frame and plot.

def get_class_distribution(dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}    for _, label_id in dataset_obj:
        label = idx2class[label_id]
        count_dict[label] += 1
    return count_dict
def plot_from_dict(dict_obj, plot_title, **kwargs):
    return sns.barplot(data = pd.DataFrame.from_dict([dict_obj]).melt(), x = "variable", y="value", hue="variable", **kwargs).set_title(plot_title)plt.figure(figsize=(15,8))
plot_from_dict(get_class_distribution(hotdog_dataset), plot_title="Entire Dataset (before train/val/test split)")

Class distribution on entire dataset [Image [1]]

Get Train and Validation Samples

We use SubsetRandomSampler to make our train and validation loaders. SubsetRandomSampler is used so that each batch receives a random distribution of classes.

We could’ve also split our dataset into 2 parts — train and val ie. make 2 Subsets. But this is simpler because our data loader will pretty much handle everything now.

SubsetRandomSampler(indices) takes as input the indices of data.

We first create our samplers and then we’ll pass it to our data-loaders.

Create a list of indices.
Shuffle the indices.
Split the indices based on train-val percentage.
Create SubsetRandomSampler.

Create a list of indices from 0 to length of dataset.

hotdog_dataset_size = len(hotdog_dataset)
hotdog_dataset_indices = list(range(hotdog_dataset_size))

Shuffle the list of indices using np.shuffle.

np.random.shuffle(hotdog_dataset_indices)

Create the split index. We choose the split index to be 20% (0.2) of the dataset size.

val_split_index = int(np.floor(0.2 * hotdog_dataset_size))

Slice the lists to obtain 2 lists of indices, one for train and other for test.

0-----------val_split_index------------------------------n
Train => val_split_index to n
Val => 0 to val_split_index

train_idx, val_idx = hotdog_dataset_indices[val_split_index:], hotdog_dataset_indices[:val_split_index]

Finally, create samplers.

train_sampler = SubsetRandomSampler(train_idx)
val_sampler = SubsetRandomSampler(val_idx)

Test

Now that we’re done with train and val data, let’s load our test dataset.

hotdog_dataset_test = datasets.ImageFolder(root = root_dir + "test",
                                            transform = image_transforms["test"]
                                           )hotdog_dataset_test
###################### OUTPUT ######################Dataset ImageFolder
    Number of datapoints: 500
    Root location: ../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/test
    StandardTransform
Transform: Compose(
               Resize(size=(224, 224), interpolation=PIL.Image.BILINEAR)
               ToTensor()
           )

Train, Validation, and Test Dataloader

Now, we will pass the samplers to our dataloader. Note that shuffle=True cannot be used when you're using the SubsetRandomSampler.

train_loader = DataLoader(dataset=hotdog_dataset, shuffle=False, batch_size=8, sampler=train_sampler)val_loader = DataLoader(dataset=hotdog_dataset, shuffle=False, batch_size=1, sampler=val_sampler)test_loader = DataLoader(dataset=hotdog_dataset_test, shuffle=False, batch_size=1)

Explore The Data

To explore our train and val data-loaders, let’s create a new function that takes in a data-loader and returns a dictionary with class counts.

Initialize a dictionary count_dict to all 0s.
If the batch_size of the dataloader_obj is 1, then loop through the dataloader_obj and update the counter.
Else, if the batch_size of the dataloader_obj is not 1, then loop through the dataloader_obj to obtain batches. Loop through the batches to obtain individual tensors. Now, updated the counter accordingly.

def get_class_distribution_loaders(dataloader_obj, dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}    if dataloader_obj.batch_size == 1:    
        for _,label_id in dataloader_obj:
            y_idx = label_id.item()
            y_lbl = idx2class[y_idx]
            count_dict[str(y_lbl)] += 1
    else: 
        for _,label_id in dataloader_obj:
            for idx in label_id:
                y_idx = idx.item()
                y_lbl = idx2class[y_idx]
                count_dict[str(y_lbl)] += 1
    return count_dict

To plot the class distributions, we will use the plot_from_dict() function defined earlier with the ax argument.

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18,7))plot_from_dict(get_class_distribution_loaders(train_loader, hotdog_dataset), plot_title="Train Set", ax=axes[0])plot_from_dict(get_class_distribution_loaders(val_loader, hotdog_dataset), plot_title="Val Set", ax=axes[1])

Class distribution for train and val set [Image [2]]

Now that we’ve looked at the class distributions, Let’s now look at a single image.

single_batch = next(iter(train_loader))

single_batch is a list of 2 elements. The first element (0th index) contains the image tensors while the second element (1st index) contains the output labels.

Here’s the first element of the list which is a tensor. This tensor is of the shape (batch, channels, height, width).

single_batch[0].shape
###################### OUTPUT ######################torch.Size([8, 3, 224, 224])

Here are the output labels for the batch.

print("Output label tensors: ", single_batch[1])
print("\nOutput label tensor shape: ", single_batch[1].shape)
###################### OUTPUT ######################Output label tensors:  tensor([1, 1, 1, 1, 1, 1, 1, 1])Output label tensor shape:  torch.Size([8])

To plot the image, we’ll use plt.imshow from matloptlib. It expects the image dimension to be (height, width, channels). We'll .permute() our single image tensor to plot it.

# Selecting the first image tensor from the batch. 
single_image = single_batch[0][0]single_image.shape
###################### OUTPUT ######################torch.Size([3, 224, 224])

Let’s view the image.

plt.imshow(single_image.(1, 2, 0))

A Single sample from the dataset [Image [3]]

PyTorch has made it easier for us to plot the images in a grid straight from the batch.

We first extract out the image tensor from the list (returned by our dataloader) and set nrow. Then we use the plt.imshow() function to plot our grid. Remember to .permute() the tensor dimensions!

# We do single_batch[0] because each batch is a list 
# where the 0th index is the image tensor and 1st index is the output label.
single_batch_grid = utils.make_grid(single_batch[0], nrow=4)plt.figure(figsize = (10,10))
plt.imshow(single_batch_grid.permute(1, 2, 0))

Multiple samples from the dataset [Image [4]]

Define a CNN Architecture

Our architecture is simple. We use 4 blocks of Conv layers. Each block consists ofConvolution + BatchNorm + ReLU + Dropout layers.

We will not use an FC layer at the end. We'll stick with a Conv layer.

class HotDogClassifier(nn.Module):
    def __init__(self):
        super(HotDogClassifier, self).__init__()
        self.block1 = self.conv_block(c_in=3, c_out=256, dropout=0.1, kernel_size=5, stride=1, padding=2)
        self.block2 = self.conv_block(c_in=256, c_out=128, dropout=0.1, kernel_size=3, stride=1, padding=1)
        self.block3 = self.conv_block(c_in=128, c_out=64, dropout=0.1, kernel_size=3, stride=1, padding=1)
        self.lastcnn = nn.Conv2d(in_channels=64, out_channels=2, kernel_size=56, stride=1, padding=0)        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
    def forward(self, x):
        x = self.block1(x)
        x = self.maxpool(x)        x = self.block2(x)        x = self.block3(x)
        x = self.maxpool(x)        x = self.lastcnn(x)        return x
    def conv_block(self, c_in, c_out, dropout,  **kwargs):
        seq_block = nn.Sequential(
            nn.Conv2d(in_channels=c_in, out_channels=c_out, **kwargs),
            nn.BatchNorm2d(num_features=c_out),
            nn.ReLU(),
            nn.Dropout2d(p=dropout)
        )        return seq_block

Now we’ll initialize the model, optimizer, and loss function.

Then we’ll transfer the model to GPU.

We’re using the nn.CrossEntropyLoss even though it's a binary classification problem. This means, instead of returning a single output of 1/0, we'll treat return 2 values of 0 and 1. More specifically, probabilities of the output being either 1 or 0.

We don’t have to manually apply a log_softmax layer after our final layer because nn.CrossEntropyLoss does that for us.

However, we need to apply log_softmax for our validation and testing.

model = HotDogClassifier()
model.to(device)
print(model)criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.008)
###################### OUTPUT ######################HotDogClassifier(
  (block1): Sequential(
    (0): Conv2d(3, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block2): Sequential(
    (0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block3): Sequential(
    (0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (lastcnn): Conv2d(64, 2, kernel_size=(56, 56), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Before we start our training, let’s define a function to calculate accuracy per epoch.

This function takes y_pred and y_test as input arguments. We then apply softmax to y_pred and extract the class which has a higher probability.

After that, we compare the predicted classes and the actual classes to calculate the accuracy.

def binary_acc(y_pred, y_test):
    y_pred_tag = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_tag, dim = 1)    correct_results_sum = (y_pred_tags == y_test).sum().float()    acc = correct_results_sum/y_test.shape[0]
    acc = torch.round(acc * 100)    return acc

We’ll also define 2 dictionaries which will store the accuracy/epoch and loss/epoch for both train and validation sets.

accuracy_stats = {
    'train': [],
    "val": []
}loss_stats = {
    'train': [],
    "val": []
}

Let’s TRAIN our model!

You can see we’ve put a model.train() at the before the loop. model.train() tells PyTorch that you're in training mode. Well, why do we need to do that? If you're using layers such as Dropout or BatchNorm which behave differently during training and evaluation (for eample; not use dropout during evaluation), you need to tell PyTorch to act accordingly. While the default mode in PyTorch is the train, so, you don't explicitly have to write that. But it's good practice.

Similarly, we’ll call model.eval() when we test our model. We'll see that below. Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss and accuracy per epoch to 0. After every epoch, we'll print out the loss/accuracy and reset it back to 0.

Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader.

We do optimizer.zero_grad() before we make any predictions. Since the .backward() function accumulates gradients, we need to set it to 0 manually per mini-batch. From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform backpropagation using loss.backward() and optimizer.step().

Finally, we add all the mini-batch losses (and accuracies) to obtain the average loss (and accuracy) for that epoch. We add up all the losses/accuracies for each minibatch and finally divide it by the number of minibatches ie. length of trainloader to obtain the average loss/accuracy per epoch.

The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad and not perform any backpropagation. torch.no_grad() tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.

print("Begin training.")for e in tqdm(range(1, 21)):    # TRAINING    train_epoch_loss = 0
    train_epoch_acc = 0    model.train()
    for X_train_batch, y_train_batch in train_loader:
        X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)
        optimizer.zero_grad()        y_train_pred = model(X_train_batch).squeeze()        train_loss = criterion(y_train_pred, y_train_batch)
        train_acc = binary_acc(y_train_pred, y_train_batch)        train_loss.backward()
        optimizer.step()        train_epoch_loss += train_loss.item()
        train_epoch_acc += train_acc.item()
    # VALIDATION
    with torch.no_grad():
        model.eval()
        val_epoch_loss = 0
        val_epoch_acc = 0
        for X_val_batch, y_val_batch in val_loader:
            X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)            y_val_pred = model(X_val_batch).squeeze()
            y_val_pred = torch.unsqueeze(y_val_pred, 0)            val_loss = criterion(y_val_pred, y_val_batch)
            val_acc = binary_acc(y_val_pred, y_val_batch)            val_epoch_loss += val_loss.item()
            val_epoch_acc += val_acc.item()    loss_stats['train'].append(train_epoch_loss/len(train_loader))
    loss_stats['val'].append(val_epoch_loss/len(val_loader))
    accuracy_stats['train'].append(train_epoch_acc/len(train_loader))
    accuracy_stats['val'].append(val_epoch_acc/len(val_loader))
    print(f'Epoch {e+0:02}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f} | Train Acc: {train_epoch_acc/len(train_loader):.3f}| Val Acc: {val_epoch_acc/len(val_loader):.3f}')###################### OUTPUT ######################Begin training.Epoch 01: | Train Loss: 113.08463 | Val Loss: 92.26063 | Train Acc: 51.120| Val Acc: 29.000
Epoch 02: | Train Loss: 55.47888 | Val Loss: 50.39846 | Train Acc: 63.620| Val Acc: 57.000
Epoch 03: | Train Loss: 33.44443 | Val Loss: 20.69457 | Train Acc: 70.500| Val Acc: 71.000
Epoch 04: | Train Loss: 18.75201 | Val Loss: 1.50821 | Train Acc: 77.240| Val Acc: 71.000
Epoch 05: | Train Loss: 12.88685 | Val Loss: 26.62685 | Train Acc: 75.480| Val Acc: 71.000
Epoch 06: | Train Loss: 9.70507 | Val Loss: 3.25360 | Train Acc: 81.080| Val Acc: 86.000
Epoch 07: | Train Loss: 11.04334 | Val Loss: 0.00000 | Train Acc: 79.320| Val Acc: 100.000
Epoch 08: | Train Loss: 7.16636 | Val Loss: 10.48954 | Train Acc: 83.300| Val Acc: 71.000
Epoch 09: | Train Loss: 4.32204 | Val Loss: 0.00001 | Train Acc: 86.400| Val Acc: 100.000
Epoch 10: | Train Loss: 2.03338 | Val Loss: 0.00000 | Train Acc: 91.720| Val Acc: 100.000
Epoch 11: | Train Loss: 1.68124 | Val Loss: 3.65754 | Train Acc: 92.320| Val Acc: 71.000
Epoch 12: | Train Loss: 1.27145 | Val Loss: 5.52111 | Train Acc: 93.320| Val Acc: 86.000
Epoch 13: | Train Loss: 0.42285 | Val Loss: 0.00000 | Train Acc: 97.600| Val Acc: 100.000
Epoch 14: | Train Loss: 1.03441 | Val Loss: 0.00000 | Train Acc: 94.840| Val Acc: 100.000
Epoch 15: | Train Loss: 0.76563 | Val Loss: 0.00000 | Train Acc: 96.340| Val Acc: 100.000
Epoch 16: | Train Loss: 0.16889 | Val Loss: 0.00000 | Train Acc: 98.040| Val Acc: 100.000
Epoch 17: | Train Loss: 0.42046 | Val Loss: 4.02560 | Train Acc: 96.560| Val Acc: 86.000
Epoch 18: | Train Loss: 0.57535 | Val Loss: 0.00000 | Train Acc: 95.640| Val Acc: 100.000
Epoch 19: | Train Loss: 0.40181 | Val Loss: 0.00000 | Train Acc: 96.620| Val Acc: 100.000
Epoch 20: | Train Loss: 0.92207 | Val Loss: 0.00000 | Train Acc: 95.360| Val Acc: 100.000

Visualize Loss and Accuracy

To plot the loss and accuracy line plots, we again create a dataframe from the accuracy_stats and loss_stats dictionaries.

train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(30,10))
sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable",  ax=axes[0]).set_title('Train-Val Accuracy/Epoch')sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')

Acc/loss curves for train and val set [Image [5]]

Test

After training is done, we need to test how our model fared. Note that we’ve used model.eval() before we run our testing code. To tell PyTorch that we do not want to perform back-propagation during inference, we use torch.no_grad(), just like we did it for the validation loop above.

We start by defining a list that will hold our predictions. Then we loop through our batches using the test_loader. For each batch -
We move our input mini-batch to GPU.
We make the predictions using our trained model.
Apply log_softmax activation to the predictions and pick the index of highest probability.
Move the batch to the GPU from the CPU.
Convert the tensor to a numpy object and append it to our list.

y_pred_list = []
y_true_list = []
with torch.no_grad():
    for x_batch, y_batch in tqdm(test_loader):
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)        y_test_pred = model(x_batch)
        _, y_pred_tag = torch.max(y_test_pred, dim = 1)        y_pred_list.append(y_pred_tag.cpu().numpy())
        y_true_list.append(y_batch.cpu().numpy())

We’ll flatten out the list so that we can use it as an input to confusion_matrix and classification_report.

y_pred_list = [i[0][0][0] for i in y_pred_list]y_true_list = [i[0] for i in y_true_list]

Classification Report

Finally, we print out the classification report which contains the precision, recall, and the F1 score.

print(classification_report(y_true_list, y_pred_list))
###################### OUTPUT ######################precision    recall  f1-score   support           0       0.90      0.91      0.91       249
           1       0.91      0.90      0.91       249    accuracy                           0.91       498
   macro avg       0.91      0.91      0.91       498
weighted avg       0.91      0.91      0.91       498

Confusion Matrix

Let’s use the confusion_matrix() function to make a confusion matrix.

print(confusion_matrix(y_true_list, y_pred_list))
###################### OUTPUT ######################[[226  23]
 [ 24 225]]

We create a dataframe from the confusion matrix and plot it as a heatmap using the seaborn library.

confusion_matrix_df = pd.DataFrame(confusion_matrix(y_true_list, y_pred_list)).rename(columns=idx2class, index=idx2class)fig, ax = plt.subplots(figsize=(7,5))         
sns.heatmap(confusion_matrix_df, annot=True, ax=ax)