
Up until my previous article, I explored the very first way to predict whether an image belongs to a particular class or not – the Pixel Similarity Approach. This was just the beginning though, and we saw quite clearly why this approach does not pan out in practical applications of image classification as well as other Deep Learning tasks.
Here, we shall now dive into executing some real training through an actual Linear Neural Network model on the dataset we’ve been using for our journey into the world of fastai so far – The Rock, Paper, Scissors dataset from Kaggle.
In case you want to follow along from the beginning, here’s the link to the first article from this fastai series I’m doing. Happy learning! 🙂
Now let’s get into the basics of Stochastic Gradient Descent first and why exactly do we need it for our image classification problem here.
SGD fundamentals in brief
A model can only get better by learning – the fault in our previous Pixel Similarity approach was that it didn’t have any set of parameters for us to modify in order to teach the model when it predicted right or wrong image labels. In case of SGD, we now have that flexibility. We can now:
- Initialise random weights and biases for our model
- Calculate the loss upon the class output by the model – it will be a lower value for an accurate prediction and a larger value for a wrong prediction
- Adjust the parameters in return by a certain amount – called the learning rate – so that the model begins to learn
- …and repeat the process from step 2 until it’s time to stop.
This entire cycle of predicting a class label from the model and updating the parameters is quite intuitive and easy to replicate from scratch on our own and I’ll be showing you how in just a bit. But first, it’s time to import our data into our jupyter notebook.
Getting the data
First comes the universal import we’ve been using throughout this series.
from Fastai.vision.all import *
DATASET_PATH = Path('RockPaperScissors/data')
Let’s import the data and convert them into Pytorch tensors in one go.
rock_train = (DATASET_PATH/'train'/'rock').ls().sorted()
paper_train = (DATASET_PATH/'train'/'paper').ls().sorted()
scissors_train = (DATASET_PATH/'train'/'scissors').ls().sorted()
#cumulate all the images of three classes as Pytorch tensors
rock_tensors = [tensor(Image.open(o)) for o in rock_train]
paper_tensors = [tensor(Image.open(o)) for o in paper_train]
scissors_tensors = [tensor(Image.open(o)) for o in scissors_train]
stacked_rock = torch.stack(rock_tensors).float()/255
stacked_paper = torch.stack(paper_tensors).float()/255
stacked_scissors = torch.stack(scissors_tensors).float()/255
stacked_rock.shape, stacked_paper.shape, stacked_scissors.shape
The output is as we’ve been expecting.
Output:
torch.Size([840, 300, 300, 4]),
torch.Size([840, 300, 300, 4]),
torch.Size([840, 300, 300, 4]))
where, 840 is our size of training set for each class of image, 300*300 is our image dimension and 4 is the number of channels in each image (three RGB + one alpha channel).
Now that we have this, let’s go ahead and import the validation set as well.
rock_val = (DATASET_PATH/'valid'/'rock').ls()
paper_val = (DATASET_PATH/'valid'/'paper').ls()
scissors_val = (DATASET_PATH/'valid'/'scissors').ls()
stacked_rock_val = torch.stack([tensor(Image.open(o)) for o in rock_val])
stacked_rock_val = stacked_rock_val.float()/255
stacked_paper_val = torch.stack([tensor(Image.open(o)) for o in paper_val])
stacked_paper_val = stacked_paper_val.float()/255
stacked_scissors_val = torch.stack([tensor(Image.open(o)) for o in scissors_val])
stacked_scissors_val = stacked_scissors_val.float()/255
Making the training and validation sets
Pytorch expects our training and validation data in pairs of (X, y) where X will be our image and y is our class label. We need to zip them together using the simple Python zip function so that they can be fed into the model later.
train_x = torch.cat([stacked_rock, stacked_paper, stacked_scissors]).view(-1, 300*300*4)
train_x.shape
Output:
torch.Size([2520, 360000])
We now have a tensor with all our training images stacked upon each other in a sequence. The view function from Pytorch changes our rank-3 tensor into rank-2 tensor, the parameter -1 being a wildcard that makes sure that pytorch can make the rows as many as needed when changing the dimensions of the tensor.
Now that we have the training images, we need to get the labels as well.
train_y = tensor([0] * len(rock_train) + [1] * len(paper_train) + [2] * len(scissors_train)).unsqueeze(1)
train_y.shape
Output:
torch.Size([2520, 1])
This makes sense, as the label is only a single digit number of value 0, 1 or 2, pertaining or either the rock, the paper or the scissor class.
Likewise, we repeat the steps to get our tensors for valid_x and valid_y as well.
Finally, we zip the training data into one entity.
dataset = list(zip(train_x, train_y))
That is it for the data consolidation process. Now we get to the model building part.
Creating a Linear model
According to the Pytorch docs, a nn.Linear(n, m) is a module that creates a single layer feed forward neural network with n inputs and m outputs.
Since we are dealing with a multi-class classification so m will be 3, as per our classes. Regarding the number of input parameters, we will make it the same as our training data features, that is 3003004.
model = nn.Linear(300*300*4, 3)
weight, bias = list(model.parameters())
weight.shape, bias.shape
Output:
(torch.Size([3, 360000]), torch.Size([3]))
Are we ready to start predicting from our model? Yes, we are.
Predicting for a single image
Let’s apply our training pipeline to one image and see if it works. Later, we can go about doing so on batches of separate training and validation sets.
Get one random image.
random_img = train_x[1000]
random_img.shape
Output:
torch.Size([360000])
Our image is in the form of a rank-1 tensor. We need to convert it into a rank-2 tensor in the form of [1, 360000] so that our transition to using multiple images later is easier and more convenient. We do it via the view function.
x = random_img.view(1, 300*300*4)
x.shape
Output:
torch.Size([1, 360000])
Perfect. Now we’ll make a prediction with our model.
pred_raw = model(x)
pred_raw
Output:
tensor([[ 0.6482, 0.1519, -0.3730]], grad_fn=<AddmmBackward>)
Notice that the sum of the outputs for each class does not equal to 1. But do we need it equalling to 1? Yes, we do – we need a probability chart of image falling within each class of images. Since we’re working with multi-class classification, a simple softmax function should be able to do the job.
pred = torch.softmax(xb, dim = 1)
pred
Output:
tensor([[0.5079, 0.3092, 0.1829]], grad_fn=<SoftmaxBackward>)
Observe the probabilities output by the model carefully. We took an image at position 1000 in our training data. The first 840 images are from the rock class. The next 840 are from the paper class. Our image clearly belongs to the paper class, however the probability of the rock class is being shown to be the greatest – which is a wrong prediction.
In order for our prediction to be useful and our model to learn, we want the weights to be tuned. We want to compare the actual target (the trainy) to the predicted values and we do it via the loss function._
Getting the loss
For multi-class classification task, we have the nn.CrossEntropy function to use. This loss function takes the ground-truth integer index as a parameter, rather than a one-hot vector.
criterion = nn.CrossEntropyLoss()
loss = criterion(pred, torch.Tensor([1]).long())
print(loss)
Output:
tensor(1.1318, grad_fn=<NllLossBackward>)
Nice! This seems to be working as expected. Now let’s go ahead and apply all of this on our train and validation data.
Establishing a training pipeline
Just like it’s described in the fastai book, I’ll define some custom functions to make our job easier. The first function calculates our gradient and applies backpropagation to it. This makes sure that our parameters are updated as needed.
def calc_grad(xb, yb, model):
preds_raw = model(xb)
loss = criterion(preds_raw, yb.flatten())
loss.backward()
Note: We flatten the target yb tensor in order to make it a rank-1 tensor which the loss function expects.
Now, we go ahead and define a simple dataloader which will pass the images to the model in batches.
dl = DataLoader(dataset, batch_size = 64)
valid_dl = DataLoader(dataset_valid, batch_size = 64)
Next, we want a function to perform our one training epoch in particular. This shall be okay for the job:
def train_epoch(model, params, lr):
for xb, yb in dl:
calc_grad(xb, yb, model)
for p in params:
p.data -= p.grad * lr
p.grad.zero_()
Finally, we need to calculate the accuracy after each epoch. We do it with another function.
def batch_accuracy(xb, yb):
preds = torch.softmax(xb, dim = 1)
predicted_class = torch.argmax(preds, dim = 1)
correct = (predicted_class == yb)
return correct.float().mean()
Lastly, we need another function for calculating our validation accuracies and consolidating them.
def validate_epoch(model):
accs = [batch_accuracy(model(xb), yb) for xb, yb in valid_dl]
return round(torch.stack(accs).mean().item(), 4)
Now that everything is done, let’s go ahead and perform some training!
lr refers to the learning rate. We keep it at 0.001.
# do for 5 epochs
for i in range(5):
train_epoch(model, params, lr)
print(validate_epoch(model), end = ',')
Output:
0.2891, 0.3181, 0.3281, 0.3542, 0.3542
These are the validation accuracies printed after each epoch and we see that it becomes constant at very low value. It seems we need a better model than the Linear model since this clearly seems to be overfitting badly.
Concluding…
This is the end of our simple training pipeline via SGD and we did it all with our own custom functions from scratch. Our model can be improved by adding some non-linearity to it in the future, but for now, I think this was a great learning experience for a start.
If you’re still with me, give yourself a pat on the back for following along all the way with me, you certainly deserve it!
Next up, I’ll be reading further in the Fastai book about ways to improve this model further and therefore increase the validation accuracies a bit. So if you want, follow me to stay tuned for future articles on the same! 😁
Thank you for reading and if you want to check out the GitHub repo for this project, here it is:
Do you want to get one free, clean email from me every week or two weeks containing the best of my curated articles and tutorials that I publish? Join my Codecast!