The world’s leading publication for data science, AI, and ML professionals.

PyTorch Ignite Tutorial- Classifying Tiny ImageNet with EfficientNet

A step-by-step guide on using PyTorch Ignite to simplify your PyTorch deep learning implementation

HANDS-ON TUTORIALS

Photo by Olga Bast on Unsplash
Photo by Olga Bast on Unsplash

PyTorch is a powerful deep learning framework that has been adopted by tech giants like Tesla, OpenAI, and Microsoft for key research and production workloads.

Its open-source nature means that PyTorch’s capabilities can be readily leveraged by the public as well.

A problem with deep learning implementation is that the codes can quickly grow to become repetitive and overly lengthy. This has sparked the creation of high-level libraries to streamline these PyTorch codes, and one of which is PyTorch Ignite.

This article provides a clearly explained walkthrough on how to use PyTorch Ignite to simplify the development of deep learning models in PyTorch.


Contents

(1) About PyTorch Ignite (2) Step-by-Step Implementation (3) Wrapping things up


About PyTorch Ignite

Image used under BSD 3-Clause License
Image used under BSD 3-Clause License

PyTorch Ignite is a high-level library that helps with training and evaluating neural networks in PyTorch flexibly and transparently.

It reduces the amount of code needed to build deep learning models while maintaining simplicity and maximum control throughout.

PyTorch Ignite code (left) vs Pure PyTorch code (right) | Image used under BSD 3-Clause License
PyTorch Ignite code (left) vs Pure PyTorch code (right) | Image used under BSD 3-Clause License

The above image illustrates the extent to which PyTorch Ignite compresses pure PyTorch code into something more concise.

Besides eliminating low-level codes, PyTorch Ignite also comes with utility support for metrics evaluation, experiment management, and model debugging.


Step by Step Implementation

The demonstration task in this tutorial is to build an image classification deep learning model on the Tiny ImageNet dataset.

Tiny ImageNet is a subset of the ImageNet dataset in the famous ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

The dataset contains 100,000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images.

Sample images from Tiny ImageNet dataset | Image by author
Sample images from Tiny ImageNet dataset | Image by author

Let’s get to the steps where we detail the use of PyTorch and Ignite to classify these images as accurately as possible.


Step 1 – Initial setup

We will use Google Colab since it offers free access to GPUs, which we can readily utilize. Feel free to follow along with this completed demo Colab notebook.

Make sure that you have set your Colab runtime to GPU. Once done, execute the following steps as part of the initial setup:

  1. Install and import the necessary Python libraries
  2. Define GPU support for PyTorch (i.e., use CUDA).

Step 2 – Download Tiny ImageNet dataset

There are two ways to download the Tiny ImageNet dataset, namely:

  • Download directly from Kaggle with the opendatasets library
  • Use GNU wget package to download from the official Stanford site

For this project, I used wget to retrieve the raw dataset (in a zip file). Once downloaded, we can unzip the zip file and set the respective folder paths for the extracted images.

If done correctly, you should see the folders appear on the Colab sidebar:

Image by author
Image by author

Step 3 – Setup helper functions

We define helper functions to make our lives easier later on. There are two groups of functions created, and they are to:

  • Display single or a batch of sample images

This allows us to visualize a random subset of images that we are working on.

  • Create DataLoaders for the image datasets

The job of a DataLoader is to generate mini-batches of data from a dataset, giving us the flexibility to choose from different sampling strategies and batch sizes.

In the code above, we used the ImageFolder function from torchvision.datasets to generate datasets. For ImageFolder to work, images in training and validation folders must be arranged in the following structure:

Expected folder structure for the image data: root/label/filename | Image by author
Expected folder structure for the image data: root/label/filename | Image by author

Step 4 – Organize validation data folder

You will notice that the training folder meets the structure needed for ImageLoader in Step 3, but the validation folder does not.

The images in the validation folder are all saved within a single folder, so we need to reorganize them into sub-folders based on their labels.

The validation folder contains a _val_annotations.txt_ file which comprises six tab-separated columns: filename, class label, and details of the bounding box (x,y coordinates, height, width).

Data within the val_annotations.txt file | Image by author
Data within the val_annotations.txt file | Image by author

We extract the first two columns to save the pairs of filename and corresponding class labels in a dictionary.

Corresponding labels for each validation image | Image by author
Corresponding labels for each validation image | Image by author

To find out what each class label means, you can read the words.txt file. For example:

Sample of label descriptors for corresponding class label codes | Image by author
Sample of label descriptors for corresponding class label codes | Image by author

After that, we carry out the folder path reorganization:


Step 5 – Define image pre-processing transformations

All pre-trained Torchvision models expect input images to be normalized in the same way (as part of pre-processing requirements).

It requires these images to be in 3-channel RGB format of shape (3 x H x W), where H (height) and W (width) are at least 224 pixels.

The pixel values then need to be normalized according to mean values of (0.485, 0.456, 0.406) and standard deviation values of (0.229, 0.224, 0.225).

On top of that, we can introduce various transformations (e.g., center crops, random flips, etc.) to augment the image dataset and improve model performance.

We place these transformations in a Torchvision Composewrapper to link them all together.


Step 6 – Create DataLoaders

We described the concept of DataLoaders in Step 3 and created a helper function for setting up DataLoaders. It is time to put the function to good use by creating DataLoaders for both the training and validation sets.

We specify the transformation steps in Step 5 and define a batch size of 64. This means the DataLoader will push out 64 images each time it is called.


Step 7 – Define model architecture

The Torchvision models subpackage torchvision.modelscomprises numerous pre-trained models for us to use. This includes popular architectures such as ResNet-18, VGG16, GoogLeNet and ResNeXt-50.

We will do something different for this project by selecting a pre-trained model that is not within the default list of Torchvision models. In particular, we will be using EfficientNet.

EfficientNet is a convolutional neural network architecture and scaling method developed by Google in 2019. It has surpassed state-of-the-art accuracy with up to 10 times better efficiency (i.e., smaller and faster).

The graphs below illustrate how EfficientNet (red line) outperforms other architectures in accuracy (on ImageNet) and computing resources.

Image used under Apache 2.0 License
Image used under Apache 2.0 License

The different versions of EfficientNet (b0 to b7) differ based on the number of model parameters. A higher number of parameters leads to greater accuracy but at the expense of longer training time.

We will use the PyTorch implementation of EfficientNet to set up an EfficientNet-B3 architecture for this tutorial. I chose B3 because it provides a nice balance between accuracy and training time.

The PyTorch implementation of the newer EfficientNet v2 is coming soon, so stay tuned to this GitHub repo for the latest updates.


Step 8 – Define loss function, hyperparameters, and optimizer

The most suitable loss function for the image classification task is categorical cross-entropy loss.

We will use a set of baseline values for model parameters such as learning rate, number of epochs, logging frequency, and type of optimizer.


Step 9 – Instantiate trainer engine

The main essence of the Ignite framework is the Engineclass, which executes processing functions over input data and returns an output.

When we create a trainer engine, we are initializing a class that will be repeatedly called upon to train the model on batches of data generated by the DataLoaders.

PyTorch Ignite comes with in-built helper functions to create trainer engines with just single lines of code. For our use case of supervised image classification, we utilize the create_supervised_trainerfunction.

These engines also allow us to attach useful event handlers, such as a progress bar to monitor training.


Step 10 – Define evaluation metrics

The metrics for image classification model evaluation are accuracy (for us to interpret the model’s performance) and cross-entropy loss (for the model to improve iteratively).

You can also create your own custom metric before attaching it to an engine. For example, the F1 score can be derived arithmetically from the default Precision and Recall metrics:

from ignite.metrics import Precision, Recall

precision = Precision(average=False)
recall = Recall(average=False)
F1 = (precision * recall * 2 / (precision + recall)).mean()
F1.attach(engine, "F1")

Step 11— Instantiate evaluator engines

After defining evaluation metrics, we can initialize evaluator engines to evaluate model performance. The evaluator engine will take the model and evaluation metrics (from Step 10) as arguments.

We define an evaluator engine for the training set and a separate evaluator engine for the validation set. This is because they have different roles in the whole model training process.

The validation evaluator will be used to save the best model based on validation metrics, while the training evaluator will only be logging metrics from the training set.


Step 12— Create Event Handlers

To improve the [Engine](https://pytorch.org/ignite/generated/ignite.engine.engine.Engine.html#ignite.engine.engine.Engine)‘s **** flexibility, an event system is introduced to facilitate interactions on each step of the training run for events such as:

  • Engine started/completed
  • Epoch started/completed
  • Batch iteration started/completed

With the help of decorators, we can create custom codes known as event handlers. Event handlers are functions that are executed when specific events occur. For example, we can log the metrics upon completion of each iteration (Events.ITERATION_COMPLETED) and epoch (Events.EPOCH_COMPLETED).

Furthermore, we want a checkpoint handler that saves our best models (as .pt files) based on validation accuracy. This is done easily with the helper method save_best_model_by_val_scorefrom the common module.

In the functions accompanying each event, you will notice that we use variables and engines that we have already built in the earlier steps.


Step 13 – Setup Tensorboard

Tensorboard is a useful toolkit to track and visualize metrics (such as loss and accuracy) as part of machine learning experimentation.

PyTorch is integrated with Tensorboard, so we can start by creating a Tensorboard logger handler and specifying the directory to store the logs.

With the Tensorboard logger initialized, we can attach output handlers to specify the events and corresponding metrics to save for visualization later on.

Although we are using Tensorboard here, we can easily use other popular logging tools such as Weights and Biases, ClearML, and MLflow. Have a look at the common module documentation for more information.


Step 14 – Commence model training

We have finally reached the stage where we can start the actual model training. We do this by getting the trainer engine to run on the training set DataLoader.

Here is what the first training epoch looks like in Colab:

Image by author
Image by author

With just one epoch, EfficientNet-B3 has already achieved an impressive validation accuracy of 61.1%.

Once training is complete, we can run the following code to get the final evaluation metrics on the validation set:

print(evaluator.state.metrics)

The final accuracy score obtained after three epochs was 66.57%.


Step 15 – View Tensorboard in Colab

We call a set of magic commands to load the Tensorboard within the Colab notebook.

After executing the above commands, the following Tensorboard interface will load in the Colab notebook. This visual dashboard provides us with metrics information obtained from the training runs.

Screenshot of Tensorboard | Image by author
Screenshot of Tensorboard | Image by author

Wrapping things up

In this tutorial, we covered the steps to leverage the flexibility and simplicity of the Ignite framework to build PyTorch deep learning models.

PyTorch Ignite has many other functionalities to suit the needs of more complex neural network designs, so feel free to explore the documentation and example notebooks.

For example, instead of the static learning rate used earlier, we can incorporate a learning rate scheduler (LRScheduler) handler to **** adjust the learning rate values during training. The flexibility also means that we can include other algorithms like FastAI’s learning rate finder in the setup.

Project Links

Photo by Aziz Acharki on Unsplash
Photo by Aziz Acharki on Unsplash

Before You Go

I welcome you to join me on a Data Science learning journey. Follow this Medium page and check out my GitHub to stay in the loop of practical and educational data science content. Meanwhile, here’s wishing you the best of luck in the exam!


References


Related Articles