
This practical tutorial shows you how to classify images using a pre-trained Deep Learning model with the PyTorch framework.
The difference between this beginner-friendly image classification tutorial to others is that we are not building and training the Deep neural network from scratch. In practice, only a few people train neural networks from scratch. Instead, most Deep Learning practitioners use a pre-trained model and fine-tune it to a new task.
In practice, only a few people train neural networks from scratch.
The specific problem setting is to build a binary image classification model to classify images of cheetahs and lions based on a small dataset. For this purpose, we will fine-tune a pre-trained image classification model using PyTorch.
![Sample images from the dataset [1].](https://towardsdatascience.com/wp-content/uploads/2023/05/1kwMjQhwfHHH12m6CjHTdOg.jpeg)
This tutorial follows a basic Machine Learning workflow:
You can follow along in my related Kaggle Notebook.
Prerequisites and Setup
Ideally, you should have some familiarity with Python.
As this is a practical tutorial, we will only cover how to build an image classification model at a high level. We will not cover a lot of theory, such as how convolutional layers or backpropagation work. I will mark sections where you can dig deeper once you feel comfortable with this topic with this sign: ⚒️
If you want to supplement this guide with some theoretical background information, I recommend the free Kaggle Learn courses on Deep Learning and Computer Vision.
Let’s begin by importing PyTorch and other relevant libraries:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import cv2
import albumentations as A
from albumentations.pytorch import ToTensorV2
import numpy as np # data processing
import matplotlib.pyplot as plt # Data visualization
from tqdm import tqdm # Progress bar
The essential libraries are PyTorch (version 1.13.0) for deep learning, OpenCV (version 4.5.4) for image processing, and Albumentations (version 1.3.0) for data augmentation.
Step 1: Prepare and Explore Data
The first step is to become familiar with the data. For this tutorial, we will keep the exploratory data analysis step short.
First, we will load the data. The example dataset [1] has two folders with images – one folder for each class.
![Example dataset [1] for binary image classification.](https://towardsdatascience.com/wp-content/uploads/2023/05/1Bn6EQlyX1ZuZDQ8mqT1rtA.png)
The following code goes through all subfolders and creates a Pandas dataframe containing the file name and its label.
import os
import pandas as pd
root_dir = ... # Insert your data here
sub_folders = ["Cheetahs", "Lions"] # Insert your classes here
labels = [0, 1]
data = []
for s, l in zip(sub_folders, labels):
for r, d, f in os.walk(root_dir + s):
for file in f:
if ".jpg" in file:
data.append((os.path.join(s,file), l))
df = pd.DataFrame(data, columns=['file_name','label'])
Insert your data here! – To follow along in this article, your dataset should look something like this:
![Example dataset [1] for binary image classification. Insert your data here.](https://towardsdatascience.com/wp-content/uploads/2023/05/1KI38TlMMuhVL_6-dm1QUBA.png)
We have about 170 photographs: roughly 85 lions and 85 cheetahs (see remark in [1]). This is a very small but balanced dataset. It’s perfect for fine-tuning!
import seaborn as sns
sns.countplot(data = df, x = 'label');

To get a feeling for the dataset, it is always a good idea to plot a few samples:
fig, ax = plt.subplots(2, 3, figsize=(10, 6))
idx = 0
for i in range(2):
for j in range(3):
label = df.label[idx]
file_path = os.path.join(root_dir, df.file_name[idx])
# Read an image with OpenCV
image = cv2.imread(file_path)
# Convert the image to RGB color space.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Resize image
image = cv2.resize(image, (256, 256))
ax[i,j].imshow(image)
ax[i,j].set_title(f"Label: {label} ({'Lion' if label == 1 else 'Cheetah'})")
ax[i,j].axis('off')
idx = idx+1
plt.tight_layout()
plt.show()
![Sample images from the dataset [1].](https://towardsdatascience.com/wp-content/uploads/2023/05/1kwMjQhwfHHH12m6CjHTdOg.jpeg)
By exploring a dataset like this, you can gain some insights. E.g. as you can see here, the images are not limited to the animals but also to statues.
Before we go any further, let’s split the dataset into training and testing data. The training data will be used to build our model, and the test data will be a hold-out dataset to evaluate the final model’s performance on unseen data. In this example, we will set 10% of the data aside for testing.
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df,
test_size = 0.1,
random_state = 42)

Step 2: Build a Baseline
Next, we will build a baseline. A baseline consists of three key components:
- A data pipeline for loading images
- A model with loss function and optimizer
- A training pipeline, including a cross-validation strategy
In this section, we will go through each component and finally wrap it up nicely.
Because training a Deep Learning model includes a lot of experimentation, we want to be able to switch out specific parts of the code quickly. Thus, we will try to make the following code as modular as possible and work with a configuration for tuning:
from types import SimpleNamespace
cfg = SimpleNamespace(**{})
We will add the configurable parameters as we go along.
Build a data pipeline for loading images
First, you must build a pipeline to load, preprocess and feed your images to the neural network in batches (instead of all at once). PyTorch provides two core classes you can use for this purpose:
Dataset
class: Loads and preprocesses the dataset. You will need to customize this class for your purpose.Dataloader
class: Loads batches of data samples to the neural network.
First, you need to customize the Dataset
class. Its key components are:
- Constructor: to load the dataset as, e.g., Pandas Dataframe
__len__()
: to get the length of the dataset. This usually will require minimal adjustments related to how you pass in the dataset.__getitem__()
: to get a sample from the dataset by index. This is usually the part where you modify most of the code depending on any preprocessing you want to do.
Below you can find a template to customize the Dataset
class.
class CustomDataset(Dataset):
def __init__(self, df):
# Initialize anything you need later here ...
self.df = df
self.X = ...
self.y = ...
# ...
# Get the number of rows in the dataset
def __len__(self):
return len(self.df)
# Get a sample of the dataset
def __getitem__(self, idx):
return [self.X[idx], self.y[idx]]
When loading your dataset, you can also perform any required preprocessing, such as transforms or image standardization. This happens in __getitem__()
.
In this example, we first load the image from the root directory (cfg.root_dir
) with OpenCV and convert it to the RGB color space. Then we will apply basic transforms: Resize the image (cfg.image_size
) and convert the image from a NumPy array to a tensor. Finally, we will normalize the values of the image to be in the [0, 1] range by dividing the values by 255.
cfg.root_dir = ... # Insert your data here
cfg.image_size = 256
class CustomDataset(Dataset):
def __init__(self,
cfg,
df,
transform=None,
mode = "val"):
self.root_dir = cfg.root_dir
self.df = df
self.file_names = df['file_name'].values
self.labels = df['label'].values
if transform:
self.transform = transform
else:
self.transform = A.Compose([
A.Resize(cfg.image_size, cfg.image_size),
ToTensorV2(),
])
def __len__(self):
return len(self.df)
def __getitem__(self, idx):
# Get file_path and label for index
label = self.labels[idx]
file_path = os.path.join(self.root_dir, self.file_names[idx])
# Read an image with OpenCV
image = cv2.imread(file_path)
# Convert the image to RGB color space.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Apply augmentations
augmented = self.transform(image=image)
image = augmented['image']
# Normalize because ToTensorV2() doesn't normalize the image
image = image/255
return image, label
Next, we need a Dataloader
to feed the samples of the Dataset
to the neural network in batches because we (probably) don’t have enough RAM to feed all the images to the model at once.
You need to provide the Dataloader
the instance of the Dataset
you want to navigate, the size of the batches (cfg.batch_size
), and the information on whether to shuffle the data.
cfg.batch_size = 32
example_dataset = CustomDataset(cfg, df)
example_dataloader = DataLoader(example_dataset,
batch_size = cfg.batch_size,
shuffle = True,
num_workers=0,
)
The batch size should be fixed throughout the training and not be tuned [2]. Because the training speed is related to the batch size, we want to use the biggest batch size possible. Start with a batch size of 32 and then increase it in powers of two (64, 128, etc.) until you get a memory error, and then use the last batch size.
When you iterate over the Dataloader
, it will give you batches of samples from the customized Dataset
. Let’s retrieve the first batch for a sanity check:
for (image_batch, label_batch) in example_dataloader:
print(image_batch.shape)
print(label_batch.shape)
break
torch.Size([32, 3, 256, 256])
torch.Size([32])
The Dataloader
returns the image batch and a label batch. The image_batch
is a tensor of the shape (32, 3, 256, 256)
. This is a batch of 32 (batch_size
) images with the shape (3, 256, 256)
(color_channels, image_height, image_width
). The label_batch
is a tensor of the shape (32)
. These are the corresponding labels to the 32 images.

This section explained how to build a data pipeline. In a later section (see Setup a training pipeline), we will use the Dataset
and Dataloader
to create separate pipelines for training, validation, and testing.
Before we train the model, we need to split the training data again into a training and a validation dataset. Training the model on a dataset and then evaluating the model on the same data is a methodological mistake because the model just needs to memorize the labels of the seen samples. Thus, instead of generalizing, the model will overfit to the training data.
To avoid overfitting, let’s randomly partition the training data into training and validation sets with the train_test_split()
function for now. This section will later be replaced with a cross-validation strategy.
X = df
y = df.label
train_df, valid_df, y_train, y_test = train_test_split(X,
y,
test_size = 0.2,
random_state = 42)

With this split, we can now create Datasets
and Dataloaders
for the training and validation data:
train_dataset = CustomDataset(cfg, train_df)
valid_dataset = CustomDataset(cfg, valid_df)
train_dataloader = DataLoader(train_dataset,
batch_size = cfg.batch_size,
shuffle = True)
valid_dataloader = DataLoader(valid_dataset,
batch_size = cfg.batch_size,
shuffle = False)
Prepare the model
This is the part where we would learn about building a neural network in PyTorch. When I started learning about Deep Learning, I thought building neural networks was an important part of training Deep Learning models. But the reality is that this is what researchers do for us. We, the practitioners, get to lean back and use the final models for our purposes.
The researchers try different model architectures, such as convolutional neural networks (CNNs), and usually train image classification models on large baseline datasets, such as ImageNet [3]. We call these models backbones.

Fine-tuning a pre-trained neural network works so well because the first few layers often learn general features (such as edge detection).
⚒️ Of course, you should understand how neural networks work in general, including backpropagation, and how different layers, such as convolutional layers, work. However, to follow along in this practical tutorial, you don’t need to understand these details right now. Once you have finished this tutorial, you can fill in some theoretical gaps with the free Kaggle Learn courses on Deep Learning and Computer Vision.
Fantastic backbones and where to find them – Now, which of these pre-trained models should you choose, and where do you get these from?
In this tutorial, we will use [timm](https://timm.fast.ai/)
– a Deep Learning library containing a collection of state-of-the-art computer vision models created by Ross Wightman – to get pre-trained models. (You can use torchvision.models
for pre-trained models, but I personally find it easier to switch out backbones during experimentation with timm
.)
import timm
cfg.n_classes = 2
cfg.backbone = 'resnet18'
model = timm.create_model(cfg.backbone,
pretrained = True,
num_classes = cfg.n_classes)
There is a lot to unpack in this little piece of code. Let’s go step-by-step:
backbone = 'resnet18'
**** – In this example, we use a ResNet [5] with 18 layers. ResNet stands for Residual Network, and it is a type of CNN using so-called residual blocks.
⚒️We will skip over the details of ResNet and residual blocks. If you are interested in the technical details, you can dig deeper into this post, for example.
There are many different models in the ResNet family, such as ResNet18, ResNet34, etc., where the number stands for how many layers the network has. As a (very rough) rule of thumb: The higher the number of layers, the better the performance. You can print timm.list_models('*resnet*')
to see what other models are available.
⚒️ Learn about different backbones for computer vision/image classification like ResNet, DenseNet, and EfficientNet.
pretrained = True
– This means we want the weights of the model trained on ImageNet [3]. If this is set to False
, you will only get the model’s architecture without the weights [6].
num_classes = cfg.n_classes
– Because the model was pre-trained on ImageNet [3], you will get a classifier with the 1000 classes that are in ImageNet. Thus, you need to remove the ImageNet classifier and define how many classes you have in your problem [6]. If you set num_classes = 0
, you will get the model without a classifier [6].
To check output size, you can pass in a sample batch X
with 3 channels of random values in the image size [6].
X = torch.randn(cfg.batch_size, 3, cfg.image_size, cfg.image_size)
y = model(X)
It will output torch.Size([1, cfg.n_classes])
[6].

Prepare loss function and optimizer
Next, to train a model, there are two key ingredients:
- a loss function (criterion),
- an optimization algorithm (optimizer), and
- optionally a learning rate scheduler.
Loss function – Common loss functions are the following:
- Binary cross-entropy (BCE) loss for binary classification.
- Categorical cross-entropy loss for multi-class classification.
- Mean squared loss for regression.
Although we have a binary classification problem, you can also use categorical cross-entropy loss. If you like, you can switch out the loss function with BCE.
criterion = nn.CrossEntropyLoss()
Optimizer – The optimization algorithm minimizes the loss function (in our case, the cross-entropy loss). There are many different optimizers available. Let’s use a popular one: Adam.
cfg.learning_rate = 1e-4
optimizer = torch.optim.Adam(
model.parameters(),
lr = cfg.learning_rate,
weight_decay = 0,
)
Learning rate scheduler – A learning rate scheduler adapts the value of the learning rate during the training process. Although you don’t have to use a learning rate scheduler, using one can help the algorithm converge faster. This is because if the learning rate stays constant, it can prevent you from finding the optimum if it is too large, and it can take too long to converge if it is too small.
There are many different learning rate schedulers available, but Kaggle Grandmasters recommend using cosine decay as a learning rate scheduler for fine-tuning [2].
cfg.lr_min = 1e-5
cfg.epochs = 5
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer,
T_max = np.ceil(len(train_dataloader.dataset) / cfg.batch_size) * cfg.epochs,
eta_min = cfg.lr_min
)
T_max
defines the half period and should be equal to the maximum number of iterations (np.ceil(len(train_dataloader.dataset) /cfg.batch_size)*cfg.epochs
).
The resulting learning rates will look as follows over the course of a training run:

Metric – While we’re at it, let’s also define a metric to evaluate the model’s overall performance. Again, there are many different metrics. For this example, we will use accuracy as the metric:
from sklearn.metrics import accuracy_score
def calculate_metric(y, y_pred):
metric = accuracy_score(y, y_pred)
return metric
Don’t confuse the metric with the loss function. The loss function is used to optimize the learning function during training, while the metric measures the model’s performance after the training.
⚒️ Learn about different metrics and which ones are suited for which problems.
Setup a training pipeline
This is probably, the most complex but also most interesting part of this tutorial. Are you ready?
A model is typically trained in iterations. One iteration is called an epoch. Training from scratch usually requires many epochs, while fine-tuning requires only a few (roughly 5 to 10) epochs.
In each epoch, the model is trained on the full training data and then validated on the full validation data. We will now define two functions: One function to train (train_an_epoch()
) and one function to validate the model on an epoch (validate_an_epoch()
).
Below you can see the training function:
cfg.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def train_one_epoch(dataloader, model, optimizer, scheduler, cfg):
# Training mode
model.train()
# Init lists to store y and y_pred
final_y = []
final_y_pred = []
final_loss = []
# Iterate over data
for step, batch in tqdm(enumerate(dataloader), total=len(dataloader)):
X = batch[0].to(cfg.device)
y = batch[1].to(cfg.device)
# Zero the parameter gradients
optimizer.zero_grad()
with torch.set_grad_enabled(True):
# Forward: Get model outputs
y_pred = model(X)
# Forward: Calculate loss
loss = criterion(y_pred, y)
# Covert y and y_pred to lists
y = y.detach().cpu().numpy().tolist()
y_pred = y_pred.detach().cpu().numpy().tolist()
# Extend original list
final_y.extend(y)
final_y_pred.extend(y_pred)
final_loss.append(loss.item())
# Backward: Optimize
loss.backward()
optimizer.step()
scheduler.step()
# Calculate statistics
loss = np.mean(final_loss)
final_y_pred = np.argmax(final_y_pred, axis=1)
metric = calculate_metric(final_y, final_y_pred)
return metric, loss
Let’s go through it step-by-step:
- Set the model to the training mode. The model can also be in evaluation mode. This mode affects the behavior of the layers
[Dropout](https://pytorch.org/docs/stable/_modules/torch/nn/modules/dropout.html)
and[BatchNorm](https://pytorch.org/docs/stable/_modules/torch/nn/modules/batchnorm.html)
in a model. - Iterate over the training data in small batches. The samples and labels need to be moved to GPU if you use one for faster training (
cfg.device
). - Clear the last error gradient of the optimizer.
- Do a forward pass of the input through the model.
- Calculate the loss for the model output.
- Backpropagate the error through the model.
- Update the model to reduce the loss.
- Step the learning rate scheduler.
- Calculate the loss and metric for statistics. Because the predictions will be Tensors on the GPU, just like the inputs, we need to detach the Tensor from the automatic differentiation graph and call the NumPy function to convert them to NumPy arrays
Next, we define the validation function as shown below:
def validate_one_epoch(dataloader, model, cfg):
# Validation mode
model.eval()
final_y = []
final_y_pred = []
final_loss = []
# Iterate over data
for step, batch in tqdm(enumerate(dataloader), total=len(dataloader)):
X = batch[0].to(cfg.device)
y = batch[1].to(cfg.device)
with torch.no_grad():
# Forward: Get model outputs
y_pred = model(X)
# Forward: Calculate loss
loss = criterion(y_pred, y)
# Covert y and y_pred to lists
y = y.detach().cpu().numpy().tolist()
y_pred = y_pred.detach().cpu().numpy().tolist()
# Extend original list
final_y.extend(y)
final_y_pred.extend(y_pred)
final_loss.append(loss.item())
# Calculate statistics
loss = np.mean(final_loss)
final_y_pred = np.argmax(final_y_pred, axis=1)
metric = calculate_metric(final_y, final_y_pred)
return metric, loss
Let’s go through it step-by-step again:
- Set the model to the evaluation mode.
- Iterate over the validation data in small batches. The samples and labels need to be moved to GPU if you use one for faster training.
- Do a forward pass of the input through the model.
- Calculate the loss and metric for statistics.
At first glance, training and validating an epoch looks similar. Let’s look at a code comparison to make the differences clearer:

You can see the following differences:
- The model has to be in training or evaluation mode.
- For training the model, we need an optimizer and an optional scheduler. For validation, we only need the model.
- The gradient calculation is only active for training. For validation, we don’t need it.
Cross-validation strategy
Now, we are not yet done with the training pipeline. Earlier, we divided the training data into training and validation data. But partitioning the available data into two fixed sets limits the number of training samples.
Instead, we will use a cross-validation strategy by splitting the training data into k folds. The model is then trained in k separate iterations, in which the model is trained on k-1 folds and validated on one fold for each iteration while the folds switch at every iteration as shown below:

In this example, we are using [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html)
to create the splits. You could use KFold
instead but StratifiedKFold
has the advantage that it preserves the class distribution.
from sklearn.model_selection import StratifiedKFold
cfg.n_folds = 5
# Create a new column for cross-validation folds
df["kfold"] = -1
# Initialize the kfold class
skf = StratifiedKFold(n_splits=cfg.n_folds)
# Fill the new column
for fold, (train_, val_) in enumerate(skf.split(X = df, y = df.label)):
df.loc[val_ , "kfold"] = fold
for fold in range(cfg.n_folds):
train_df = df[df.kfold != fold].reset_index(drop=True)
valid_df = df[df.kfold == fold].reset_index(drop=True)
Adding data augmentation
When the difference between the training and validation metric is significant, this indicates that the model is overfitting to the training data. Overfitting occurs when a model is trained on only a few examples and learns irrelevant details or noise from the training data. This negatively affects the model’s performance when it’s presented with new examples. As a result, the model struggles to generalize on new images.
To overcome overfitting during the training process, you can use data augmentation. Data augmentation generates additional training data by randomly transforming existing images. This technique exposes the model to more aspects of the data, helping it to generalize better.
We can use some prepared data augmentations from the albumentations
package, such as:
- Rotating images (
A.Rotate()
) - Horizontal flipping (
A.HorizontalFlip()
) - Cutout [4] (
A.CoarseDropout()
)
Earlier, we defined a basic transform to resize and convert the image to a tensor. We will continue to use it for the validation and testing datasets because they don’t need any augmentations. For the training dataset, we create a new transform transform_soft
, which has the three above augmentations in addition to the resizing and conversion to tensor.
transform_soft = A.Compose([A.Resize(cfg.image_size, cfg.image_size),
A.Rotate(p=0.6, limit=[-45,45]),
A.HorizontalFlip(p = 0.6),
A.CoarseDropout(max_holes = 1, max_height = 64, max_width = 64, p=0.3),
ToTensorV2()])
You can control the percentage of images the augmentations are applied to with the parameter p
.
If we visualize a few samples from the augmented dataset, we can see that the three augmentations are applied successfully:
- Rotation in images 0, 1, 2, 4
- Horizontal flip is difficult to detect if you don’t know the original image, but we can see that image 2 must be horizontally flipped
- Cutout (coarse dropout) in images 1 and 4

⚒️ Next, you can review and add other image augmentation techniques, e.g., Mixup and Cutmix, to your pipeline.
Cutout, Mixup, and Cutmix: Implementing Modern Image Augmentations in PyTorch
Putting it all together
Now that we have discussed each component of the baseline from the data pipeline to the model with loss function and optimizer, to the training pipeline, including a cross-validation strategy, we can put it all together as shown in the image below:

We will iterate over each fold of our cross-validation strategy. Within each fold, we set up a data pipeline for the training and validation data and a model with loss function and optimizer. Then for each epoch, we will train and validate the model.
Before we touch anything, let’s set ourselves up for success and fix the random seeds to ensure reproducible results.
import random
def set_seed(seed=1234):
random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
np.random.seed(seed)
# In general seed PyTorch operations
torch.manual_seed(seed)
# If you are using CUDA on 1 GPU, seed it
torch.cuda.manual_seed(seed)
# If you are using CUDA on more than 1 GPU, seed them all
torch.cuda.manual_seed_all(cfg.seed)
# Certain operations in Cudnn are not deterministic, and this line will force them to behave!
torch.backends.cudnn.deterministic = True
# Disable the inbuilt cudnn auto-tuner that finds the best algorithm to use for your hardware.
torch.backends.cudnn.benchmark = False
Next, we will write a fit()
function that fits the model for all epochs. The function iterates over the number of epochs, while the training and validation functions contain inner loops that iterate over the batches in the training and validation datasets, as discussed in the section about the training pipeline.
cfg.seed = 42
def fit(model, optimizer, scheduler, cfg, train_dataloader, valid_dataloader=None):
acc_list = []
loss_list = []
val_acc_list = []
val_loss_list = []
for epoch in range(cfg.epochs):
print(f"Epoch {epoch + 1}/{cfg.epochs}")
set_seed(cfg.seed + epoch)
acc, loss = train_one_epoch(train_dataloader, model, optimizer, scheduler, cfg)
if valid_dataloader:
val_acc, val_loss = validate_one_epoch(valid_dataloader, model, cfg)
print(f'Loss: {loss:.4f} Acc: {acc:.4f}')
acc_list.append(acc)
loss_list.append(loss)
if valid_dataloader:
print(f'Val Loss: {val_loss:.4f} Val Acc: {val_acc:.4f}')
val_acc_list.append(val_acc)
val_loss_list.append(val_loss)
return acc_list, loss_list, val_acc_list, val_loss_list, model

For visualization purposes, we will also create plots of the loss and accuracy on the training and validation sets:
def visualize_history(acc, loss, val_acc, val_loss):
fig, ax = plt.subplots(1,2, figsize=(12,4))
ax[0].plot(range(len(loss)), loss, color='darkgrey', label = 'train')
ax[0].plot(range(len(val_loss)), val_loss, color='cornflowerblue', label = 'valid')
ax[0].set_title('Loss')
ax[1].plot(range(len(acc)), acc, color='darkgrey', label = 'train')
ax[1].plot(range(len(val_acc)), val_acc, color='cornflowerblue', label = 'valid')
ax[1].set_title('Metric (Accuracy)')
for i in range(2):
ax[i].set_xlabel('Epochs')
ax[i].legend(loc="upper right")
plt.show()

When we combine everything, it will look as follows:
for fold in range(cfg.n_folds):
train_df = df[df.kfold != fold].reset_index(drop=True)
valid_df = df[df.kfold == fold].reset_index(drop=True)
train_dataset = CustomDataset(cfg, train_df, transform = transform_soft)
valid_dataset = CustomDataset(cfg, valid_df)
train_dataloader = DataLoader(train_dataset,
batch_size = cfg.batch_size,
shuffle = True,
num_workers = 0,
)
valid_dataloader = DataLoader(valid_dataset,
batch_size = cfg.batch_size,
shuffle = False,
num_workers = 0,
)
model = timm.create_model(cfg.backbone,
pretrained = True,
num_classes = cfg.n_classes)
model = model.to(cfg.device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr = cfg.learning_rate,
weight_decay = 0,
)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
T_max= np.ceil(len(train_dataloader.dataset) / cfg.batch_size) * cfg.epochs,
eta_min=cfg.lr_min)
acc, loss, val_acc, val_loss, model, lrs = fit(model, optimizer, scheduler, cfg, train_dataloader, valid_dataloader)
visualize_history(acc, loss, val_acc, val_loss)
Step 3: Run Experiments
Data Science is an experimental science. Thus, the aim of this step is to find the best configuration of hyperparameters, data augmentations, model backbones, and cross-validation strategy that achieve the best performance (or whatever your objective may be – e.g., best trade-off between performance and inference time).
Setup experiment tracking
Before jumping into this step, take a minute to think about how you will track your experiments. Experiment tracking can be as simple as writing everything down with pen and paper. Alternatively, you can track everything in a spreadsheet or even use an experiment tracking system to automate the whole process.
If you are an absolute beginner, I recommend starting simple and tracking your experiments manually in a spreadsheet at first. Open an empty spreadsheet and create columns for all inputs, such as:
- backbone,
- learning rate,
- epochs,
- augmentations, and
- image size
and outputs, such as loss and metrics for training and validation, you want to track.
The resulting spreadsheet could look something like this:

⚒️ Once you feel comfortable with the Deep Learning techniques, you can level up by implementing an experiment tracking system into your pipeline to automate experiment tracking, such as Weights & Biases, Neptune, or MLFlow.
Experimentation and hyperparameter tuning
Now that you have an experiment tracking system let’s run some experiments. You can start by tweaking the following hyperparameters:
- Number of training steps: range of 2 to 10
- Learning rate: range of 0.0001 to 0.001
- Image size: range of 128 to 1028
- Backbone: Try different backbones. First, try deeper models from the ResNet family (print
timm.list_models('*resnet*')
to see what other models are available), then try a different backbone family liketimm.list_models('*densenet*')
ortimm.list_models('*efficientnet*')
⚒️ Once you feel comfortable with the Deep Learning techniques, you can level up by automating this step with Optuna or Weights & Biases.
Now it’s your turn! – Tweak a few notches and see how the model’s performance changes. Once you’re happy with the results, move on to the next step.

Step 4: Make Predictions (Inference)
Drum roll, please! Now that we have found the configuration that will give us the best model, we want to put it to good use.
First, let’s fine-tune the model with the optimal configuration on the full dataset to take advantage of every data sample. We don’t split the data into training and validation data in this step. Instead, we only have one big training dataset.
train_df = df.copy()
train_dataset = CustomDataset(cfg, train_df, transform = transform_soft)
train_dataloader = DataLoader(train_dataset,
batch_size = cfg.batch_size,
shuffle = True,
num_workers = 0,
)
But the rest of the training pipeline stays the same.
model = timm.create_model(cfg.backbone,
pretrained = True,
num_classes = cfg.n_classes)
model = model.to(cfg.device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr = cfg.learning_rate,
weight_decay = 0,
)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
T_max= np.ceil(len(train_dataloader.dataset) / cfg.batch_size) * cfg.epochs,
eta_min=cfg.lr_min)
acc, loss, val_acc, val_loss, model = fit(model, optimizer, scheduler, cfg, train_dataloader)
Inference – And finally, we will use the model to predict the hold-out test set.
test_dataset = CustomDataset(cfg, test_df)
test_dataloader = DataLoader(test_dataset,
batch_size = cfg.batch_size,
shuffle = False,
num_workers = 0,
)
dataloader = test_dataloader
# Validation mode
model.eval()
final_y = []
final_y_pred = []
# Iterate over data
for step, batch in tqdm(enumerate(dataloader), total=len(dataloader)):
X = batch[0].to(cfg.device)
y = batch[1].to(cfg.device)
with torch.no_grad():
# Forward: Get model outputs
y_pred = model(X)
# Covert y and y_pred to lists
y = y.detach().cpu().numpy().tolist()
y_pred = y_pred.detach().cpu().numpy().tolist()
# Extend original list
final_y.extend(y)
final_y_pred.extend(y_pred)
# Calculate statistics
final_y_pred_argmax = np.argmax(final_y_pred, axis=1)
metric = calculate_metric(final_y, final_y_pred_argmax)
test_df['prediction'] = final_y_pred_argmax
Below you can see the results of our model:


Summary and Next Steps
This tutorial showed you how to fine-tune a pre-trained image classification model for your specific task, evaluate it, and perform inference on unseen data using the PyTorch framework in Python.
Once you feel comfortable, you can level up by reviewing the sections marked with ⚒️ to level up to an intermediate level.
Enjoyed This Story?
Subscribe for free to get notified when I publish a new story.
Find me on LinkedIn, Twitter, and Kaggle!
References
Dataset
[1] MikołajFish99 (2023). Lions or Cheetahs – Image Classification in Kaggle Datasets.
License: According to the original image source (Open Images Dataset V6) the annotations are licensed by Google LLC under CC BY 4.0 license, and the images are listed as having a CC BY 2.0 license.
Note the original dataset contains 200 images, with 100 images of each class. But the dataset needed some cleaning, including removing images of other animals; thus, the final dataset is slightly smaller. To keep this tutorial short, we will skip the data cleaning process here.
Images
If not otherwise stated, all images are created by the author.
Literature
[2] S. Bhutani with H20.ai (2023). Best Practises for Training ML Models | @ChaiTimeDataScience #160 presented on YouTube in January 2023.
[3] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). Ieee.
[4] DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552.
[5] K. He, X. Zhang, S. Ren, & J. Sun (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
[6] timmdocs (2022). Pytorch Image Models (timm) (accessed April 10th, 2023).