The Applications and Benefits of a PreTrained Model –– Kaggle’s DogsVSCats

Published in

Towards Data Science

6 min readNov 4, 2020

For image recognition tasks, using pre-trained models are great. For one, they are easier to use as they give you the architecture for “free.” Additionally, they typically have better results and typically require need less training.

To see a real application of this theory, I will be using Kaggle’s CatVSDogs dataset in an attempt to discuss the results of using the different methods.

The steps will be as follows:

1) Imports2) Download and Unzip Files3) Organize the Files4) Set-up and Train Classic CNN Model 5) Test the CNN Model6) Set-up and Train Pre-Trained Model7) Test the Pre-Trained Model

1. Imports

In any machine learning project, imports are necessary. For this project, there are a variety of imports that are required.

import warningswarnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.functional") from google.colab import drivedrive.mount('/content/gdrive')  # connects to data stored on google driveimport osos.chdir('/content/gdrive/My Drive/')import shutilimport reimport torchimport torchvisionfrom torchvision import transformsimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimfrom google.colab import filesimport zipfile

While most of these imports won’t be used until later, it is best to import them all now. These imports tend to be of two groups: those for PyTorch and others needed for google collab.

Google Colab offers a free GPU to everyone, so it is great to use, especially for beginners. Use the code below to check that you are using the GPU. If it does not print cuda, turn on your GPU!

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(device)

PyTorch, similarly, tends to be skewed for beginners as it has an easier interface. However, it still gets the job done for any machine learning task!

2. Download and Unzip Files

The data will be pulled directly from Kaggle, which is an open-source machine learning website. The files can be reached here. Press on Download All. Then, drag the zip file into your Drive and place it in any location desired.

Do not unzip directly from Google Drie or through your computer. This will take an extensive amount of time. Thus, it is in your best interest to program a way to unzip the files and organize them in Google Drive.

Before unzipping, create a DogsVS Cats folder.

os.mkdir('/content/gdrive/My Drive/DogsVSCats')

Here, I will be storing the unzipped data.

To unzip, run the following.

with zipfile.ZipFile('/content/gdrive/My Drive/dogs-vs-cats.zip') as zf:zf.extractall('/content/gdrive/My Drive/DogsVSCats')

This will extract the dogs-vs-cats zip and place it in the new folder. After this, we need to unzip the test1.zip and train.zip files. You can delete the sample submission.csv file; we will not need it for this tutorial.

To unzip the train.zip file, run the following:

with zipfile.ZipFile('/content/gdrive/My Drive/DogsVSCats/train.zip') as zf:zf.extractall('/content/gdrive/My Drive/DogsVSCats/')

To unzip the test1.zip file, run the following:

with zipfile.ZipFile('/content/gdrive/My Drive/DogsVSCats/test1.zip') as zf:zf.extractall('/content/gdrive/My Drive/DogsVSCats/')

It might take some time, but all the files should be in their appropriate folders after that.

3. Organize the Files

Once all the files are unzipped, it is crucial to organize them based on their true classification. For this dataset, we will organize the files into “dog” and “cat” folders.

Make the needed folders.

os.mkdir('/content/gdrive/My Drive/DogsVSCats/train/cat')os.mkdir('/content/gdrive/My Drive/DogsVSCats/train/dog')

Set-up the classifications.

train_dr= '/content/gdrive/My Drive/DogsVSCats/train/'train_dog_dir = '/content/gdrive/My Drive/DogsVSCats/train/dog'train_cat_dir = '/content/gdrive/My Drive/DogsVSCats/train/cat'
files = os.listdir('/content/gdrive/My Drive/DogsVSCats/train')

Then, organize.

train = os.listdir("/content/gdrive/My Drive/DogsVSCats/train")for f in files:   catSearchObj = re.search("cat", f)   dogSearchObj = re.search("dog", f)   if catSearchObj:      shutil.move(f'{train_dr}/{f}', train_cat_dir)      print("moved!-cat")   elif dogSearchObj:      shutil.move(f'{train_dr}/{f}', train_dog_dir)      print("moved!-dog")

You don’t really have to add the print statements, but it is cool to see the movement!

Next, we need to move some of the training data into a validation dataset.

os.mkdir('/content/gdrive/My Drive/DogsVSCats/val/cat')os.mkdir('/content/gdrive/My Drive/DogsVSCats/val/dog')
val_dog_dir = '/content/gdrive/My Drive/DogsVSCats/val/dog'val_cat_dir = '/content/gdrive/My Drive/DogsVSCats/val/cat'

Relocate 1,000 files from the train dog folder and send them to the validation folder.

files = os.listdir(train_dog_dir)for f in files:   valDogSearch = re.search("5\d\d\d", f)   if valDogSearch:        shutil.move(f'{train_dog_dir}/{f}', val_dog_dir)!ls {val_dog_dir} | head -n 5

Now, do the same for the cat trained folder.

files = os.listdir(train_cat_dir)for f in files:   valCatSearch = re.search("5\d\d\d", f)   if valCatSearch:       shutil.move(f'{train_cat_dir}/{f}', val_cat_dir)!ls {val_cat_dir} | head -n 5

The last step for this part would be to transform the data so that it would be easier to train and scan.

transforms = torchvision.transforms.Compose([torchvision.transforms.Resize((224,224)),torchvision.transforms.ToTensor()])train_image_folder = torchvision.datasets.ImageFolder('/content/gdrive/My Drive/DogsVSCats/train/', transform=transforms)train_loader = torch.utils.data.DataLoader(train_image_folder, batch_size=64, shuffle=True, num_workers=4)val_image_folder = torchvision.datasets.ImageFolder('/content/gdrive/My Drive/DogsVSCats/val/', transform=transforms)val_loader = torch.utils.data.DataLoader(val_image_folder, batch_size=64, shuffle=True, num_workers=4)

4. Set up and Train Classic CNN Model

Now, we reached the actual training part! In this variation, we will be using a classic CNN model.

The model is as follows:

class DogDetector(nn.Module):   def __init__(self):      super().__init__()      self.cnn_layers = nn.Sequential(         nn.Conv2d(3, 6, kernel_size=3, stride=1, padding=1),         nn.ReLU(inplace=True),         nn.MaxPool2d(kernel_size=2, stride=2),         nn.Conv2d(6, 12, kernel_size=3, stride=1, padding=1),         nn.ReLU(inplace=True),         nn.MaxPool2d(kernel_size=2, stride=2), 
      
      )
      
      self.linear_layers = nn.Sequential(         nn.Linear(256*7*7*3, 196),         nn.Linear(196, 1),         nn.Sigmoid(),
      )   
  def forward(self, x):     x = self.cnn_layers(x)     x = x.view(x.size(0), -1)     x = self.linear_layers(x)     return xdog_detector = DogDetector()dog_detector.cuda()

Now, set the dog_detector to the class and put it on the GPU.

Training the model is a simple procedure. Because we are using the Kaggle files just for testing and not for the direct competition, we do not need the test files. Instead, we can train based on the validation set without the hassle of submitting to Kaggle.

optimizer = optim.Adam(dog_detector.parameters(), lr = 0.0001)loss_func = nn.BCELoss().cuda()EPOCHS = 5for epoch in range(EPOCHS):   print(f"epoch: {epoch}")   for i, data in enumerate(train_loader):      if i % 50 == 0:         print(f"  batch: {i}")         X, y = data         y = y.type(torch.FloatTensor).view(len(y), -1).cuda()         dog_detector.zero_grad()         output = dog_detector(X.view(-1, 3, 224, 224).cuda())         loss_val = loss_func(output, y)         loss_val.backward()         optimizer.step()    print(f"loss: {loss_val}")

The loss values tend to hover around approximately 0.4 — 0.6.

5. Test the CNN Model

Because we are testing based without Kaggle, we need to check the accuracy ourselves, which gives us more direct results. This can be done with the following:

correct = 0total = 0with torch.no_grad():   for data in val_loader:   X, y = data    output = dog_detector(X.view(-1, 3, 224, 224).cuda())   correct_sum = output.round().transpose(0, 1).cpu() == y   correct += correct_sum.sum().item()   total += len(y)print(f"Accuracy: {round(correct/total, 3)}")

When testing, the accuracy tends to hover over approximately 73.6%. This isn’t bad, but it is far from great.

6. Set up and Train Pre-Trained Model

The pre-trained model is much simpler to set up.

First, download the pre-trained model.

model_resnet18 =  torchvision.models.resnet18(pretrained=True)
new_lin = nn.Sequential(   nn.Linear(512, 1),   nn.Sigmoid()  )model_resnet18.fc = new_lin

Freeze the layers that are not needed for training.

for name, param in model_resnet18.named_parameters():   if("bn" not in name):      param.requires_grad = False

Then, send the model to the GPU.

model_resnet18.cuda()

That’s it! Now it is time to train again.

optimizer = optim.Adam(model_resnet18.parameters(), lr = 0.0001)loss_func = nn.BCELoss().cuda()EPOCHS = 5for epoch in range(EPOCHS):   print(f"epoch: {epoch}")   for i, data in enumerate(train_loader):      if i % 50 == 0:      print(f"  batch: {i}")      X, y = data      y = y.type(torch.FloatTensor).view(len(y), -1).cuda()      model_resnet18.zero_grad()      output = model_resnet18(X.view(-1, 3, 224, 224).cuda())      loss_val = loss_func(output, y)      loss_val.backward() optimizer.step()print(f"loss: {loss_val}")

The loss ranges between 0.2–0.4, which is significantly better than the CNN model.

7. Test the Pre-Trained Model

correct = 0total = 0with torch.no_grad():   for data in val_loader:      X, y = data      output = model_resnet18(X.view(-1, 3, 224, 224).cuda())      correct_sum = output.round().transpose(0, 1).cpu() == y      correct += correct_sum.sum().item()      total += len(y)print(f"Accuracy: {round(correct/total, 3)}")

Testing the model gives an approximate accuracy of 97.3%, which is significantly better than the CNN model.

To conclude, the results of this test are apparent; using a pre-trained model for many image recognition tasks is beneficial for several reasons. The first reason is the fact that using a pre-trained model requires less training and requires less effort in building the model’s architecture. Instead, the definition of the model is given for “free.” Another positive is the accuracy. Using a pre-trained model is significantly more accurate than using a custom-built convolutionary neural network (CNN). Thus, it would make sense to start with a pre-trained model when doing image recognition tasks, as it is almost always the best course of action.

The Applications and Benefits of a PreTrained Model –– Kaggle’s DogsVSCats

Written by Ali Fakhry