End to End Resnet

A quick recap of what we’ve done so far:
Part 1 – Overview of the entire project. No python was required, only a Google and Kaggle account. Hopefully you have convinced yourself that you personally are able to run the code in its entirety, as well as testing it.
Part 2 – The libraries were imported as well as the database from Kaggle. It was unzipped (uncompressed) into a folder. We also did a short overview of how to figure out what kind of hardware you’re working with. GPU information is important when dealing with image databases like ours.
Link to Colab Notebook: Cat & Dog – Resnet-18
Before we’re able to ‘load’ images into the architecture for training, we must preprocess them first.
Creating the Folder Structure
The code above creates this following structure (val and train under the main dataset folder, and folders of dogs and cats in their respective subfolder):
dataset_dogs_vs_cats
train
dogs
cats
val
dogs
cats
For those of you wishing for nomenclature, we are performing supervised machine learning here, with labeled data. The way we are ‘labeling’ the data is by putting it in a folder. The code will be told to use the name of the folder containing the images as the label.
An Aside
I will almost certainly trip up on this in the future, but the more you read about neural networks, the words ‘modeling’ and ‘models’ start to blur together and you can invariably produce statements like ‘We are modeling our problem with this particular model.’
When referring to the technical definition of the model, I encourage you to use the word architecture. I admit my code will have statements like ‘train_model’ but in this case, model = architecture, i.e. the Resnet-18 architecture. ‘We are modeling our problem with this particular architecture’ sounds a lot more palatable.
Sorting Our Data
The initial step is to split the data into training and testing. We will use the training data to train our architecture (adjust the parameters). ResNet-18 has 11 million trainable parameters. The testing set will be used to measure how well the architecture is performing, showing it new data as a ‘test’.
The ‘src_directory’ contains all the images. ‘listdir’ gets the listing of all the files in the source directory and if the file starts with ‘cat’, we place it in the cats folder, same with dog. This is captured in ‘file.startswith(‘cat or dog’)’.
You can see that by default, the destination is the ‘train’ folder, unless a random number (defaulted to be between 0 and 1) is less than 0.25 – which should happen around 25% of the time, in which case the file gets shuttled to the ‘val’ folder.
Image Transforms
A bit to unpack here. Let me get one line out of the way – ‘Resize’. The ResNet architecture expects images that are 224×224 (why is not in scope of this post), so both train and val (validation) images have to be resized as such.
Lets do this backward. The line ‘ToTensor()’ – sets our image up for computation. It is essentially a multidimensional array (height, width, color space). The ‘Normalize’ function for each [train, val] normalizes the data according to the three RGB channels. The first set of numbers is the mean, the second set is the standard deviation. These particular normalization values are used because the torchvision architectures (like ResNet) have been trained on ImageNet and calculated based off millions of images.
That leaves three transforms – Rotation, RandomHorizontalFlip, and RandomResizedCrop. The ‘what’ of these transforms is easier to understand compared to the ‘why’. Essentially, you want to present as many different training images to your network. One way to increase the variety of these pictures is use a huge dataset – in this case almost 19,000 training images.
We are able to artificially increase the variety of the images by doing small modifications on the images.
Rotation and RandomHorizontalFlip are self explanatory.
RandomResizedCrop picks a small patch out of the image (224×224 in this case) with a randomly picked range between 0.96 and 1.0 and with a randomly picked aspect ratio between 0.95 and 1.05.
Dataloaders
Dataloaders allow for batch delivery of the dataset into the architecture for training. They are critical for ensuring a streamlined flow of data. Accordingly, the first variable you see set is batch_size = 16. If you have trouble training (especially with GPU memory problems), decrease the batch size in multiples of 2.
Data_dir sets the main directory of the training and validation images. The ‘datasets.ImageFolder’ allows for the labels to be set up as we’ve done before – the name of the folder is the label itself (instead of the label embedded in the filename or stored elsewhere). You can see the ‘data_transforms’ that we defined above is applied within the ‘image_datasets’ variable.
‘torch.utils.data.DataLoader’ allows for ‘image_datasets’ to incorporate into the ‘dataloaders’ variable.
We later split the ‘dataloaders’ variable into ‘train_dataloader’ and ‘val_dataloader’ in the last two lines.
Finally, ensuring the GPU is used: device = torch.device("cuda:0"). We will make sure the architecture is loaded onto the GPU in the coming steps.
Please leave any questions below and I will try to answer them as best I can.
References
[1] Deep Learning with Pytorch, Accessed October 2020
[2] Neural Networks with Pytorch. Accessed October 2020
[3] Transfer Learning for Computer Vision. Accessed October 2020
[4] Kaggle API. Accessed October 2020