Binary Image Classification in PyTorch

Train a convolutional neural network adopting a transfer learning approach

Published in

Towards Data Science

4 min readMay 30, 2022

I personally approached deep learning using TensorFlow, which I immediately found very easy and intuitive. Many books also use this framework as a reference, such as Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow.
I then noticed that PyTorch is often used in research in both academia and industry. So I started to implement simple projects that I had already developed in TensorFlow using PyTorch, in order to have a basic understanding of both. Since I believe that the best way to learn is to explain to others, I decided to write this hands-on tutorial to develop a convolutional neural network for binary image classification in PyTorch.

Dataset

We will use the dogs vs cats dataset (which has a free license ) that you can find at the following link: https://www.kaggle.com/datasets/biaiscience/dogs-vs-cats. The datasets is open to free use. I will show you how to create a model to solve this binary classification task and how to use it for inference on new images.

The first thing to do in order to download this dataset is to access Kaggle with your credentials and then download the kaggle.json file that you can get by clicking on the Create New API Token button.

First, we need to write the code that will allow us to upload our personal Kaggle token, and download the dataset.

Download data from Kaggle

Now we need to unzip the downloaded folder into a new folder that we will name data. Next, we will also unzip the two subfolders test and train respectively.

Unzip Data

Structure and populate the subfolders

To facilitate the management of the dataset, we create an easy-to-manage folder structure.
The goal is to have a folder called training that will contain within it, the subfolders dog and cat which will obviously contain all the images of the respective pets.
The same thing should be done for the validation folder.

Create subfolders structure

Now we only need to shuffle the data and populate these newly created subfolders.

Populate subfolders

Let’s plot some image examples.

Plot examples

Create Dataloaders

Now we are going to do 3 things:

Let’s preprocess our data using the compose method, which is a simple method to apply multiple preprocessing functions like normalization and data augmentation to our dataset.
Let’s use ImageFolder to create a pytorch dataset. PyTorch infers the class automatically if the subdirectories structure is well defined (as in our case).
Use the DataLoader to slice our data in batches.

Create Dataloaders

Training step function

The training step is always defined by 3 things: model, optimizer and loss function. So let’s write a function that returns a train step function given in input these 3 entities. In this way, we don’t have to rewrite the same code again and again!

Training step function

Build the model

In solving most Kaggle tasks you don’t write a network from scratch but you use a pre-trained model called base_model and adapt it to the task at hand. Think of base_model as a model that has already learned to recognize important features in images. What we want to do is to adapt it by adding a head composed of other dense layers. In our case, the last dense layer will be composed of a single neuron that will use a sigmoid activation function so that we will have an output probability of being 0 or 1 (cat or dog).

We must be careful not to train the base model that has already been previously trained.

Let’s download a pretrained model (resnet) and freeze all the parameters. Then we are going to change the last linear layer in order to customize the model to become a binary classifier. Rember model and data must be on the same device (GPU).

Freeze parameters of a pretrained model

We now need to define loss, optimizer and train_step.

Define loss, optimizer and train step.

Train the model

Let’s write our training and evaluation phase. We are going to also implement Early Stopping and save at each epoch the best model.

Train and model evaluation

Since we started with a pretrained model and our binary classification task is very simple, in no time you should have a model capable of classifying the images in the dataset very accurately.

Inference

You can now use the model to predict the label of new images!

Inference on new images

Conclusion

We’ve successfully built an Image Classifier to recognize cats from dogs in an image. I must say that having also developed the same classifier with Tensorflow in this article, I found tensorflow to be quicker to use for this simple project. But the bright side of PyTorch from my point of view is the more granular control of the various steps, from data preprocessing to the model training. Let me know what you think!

The End

Marcello Politi

Linkedin, Twitter, CV

Binary Image Classification in PyTorch

Train a convolutional neural network adopting a transfer learning approach

Dataset

Structure and populate the subfolders

Create Dataloaders

Training step function

Build the model

Train the model

Inference

Conclusion

The End

Written by Marcello Politi