Creating a Powerful COVID-19 Mask Detection Tool with PyTorch

A Case Study on Quickly Building Small, Accurate Image Classification Models

David Yastremsky
Towards Data Science

--

Woman in COVID-19 Mask
Photo by Matthew Waring on Unsplash

Over the last year, COVID-19 has taken a social, economic, and human toll on the world. AI tools can identify proper mask-wearing to help reopen the world and prevent future pandemics.

Of course, this has deeper social implications, such as the tradeoff of privacy versus security. These are discussed in the accompanying arXiv research paper here.

In this post, I will walk you through the project that we created.

After reading this post, you will have the knowledge to create a similar pipeline not only for mask detection but any sort of image recognition.

Feature Engineering

Deep Learning COVID-19 Mask Dataset
Our Dataset

Using the MaskedFace-Net and Flickr Faces datasets, we resized the images to be 128x128 to save space. We moved images into three folders (train, val, and test). Each folder had three subdirectories (correct/incorrect/no_mask). This allowed them to be easily loaded with PyTorch’s torchvision.

batch = 16
train_dataset = torchvision.datasets.ImageFolder('./Data_Sample/train', transform = train_transform)
val_dataset = torchvision.datasets.ImageFolder('./Data_Sample/val', transform = test_transform)test_dataset = torchvision.datasets.ImageFolder('./Data_Sample/test', transform = test_transform)train_dataset_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=batch, shuffle=True)
val_dataset_loader = torch.utils.data.DataLoader(
val_dataset, batch_size=batch, shuffle=True)
test_dataset_loader = torch.utils.data.DataLoader(
test_dataset, batch_size=batch, shuffle=False)

If you look at the code snippet, you’ll see that we used torchvision’s built-in arguments to apply transformations to our images.

For all data, we changed the image to a tensor and normalized it using some standard values. For training and validation data, we applied additional transformations. For example, recoloring, resizing, and blurring the image randomly. These are applied to each batch, meaning the model is trying to learn a moving target: training images with different noise applied each time.

import torchvision.transforms as transformstest_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Transforms regularize your model to make it generalize better. As a result, it will perform better on real-world data.

Baseline Models

With a goal like real-time image classification, accuracy and speed are both key. These often have a tradeoff, as larger models can have higher accuracy. To get a baseline, we started by applying faster machine learning models.

Boy, were we in for a surprise! We realized that our dataset’s use of synthetic models made predicting somewhat trivial. Under time pressure and without a dataset near the 180k images we were using, we pressed on to see how we could improve performance further.

Machine Learning Model Accuracies
Baseline Model Accuracies

For these simpler models, we trained scikit-learn’s random forest, cv2’s Haar cascade classifier, and a custom CNN using LeakyReLU and AdamW.

Advanced Models

We next moved on to using state-of-the-art models. Top researchers train and finetune these models, using millions of dollars of computing. For general problems, they tend to work best.

We could train them on our data, seeing how they perform on our mask classification task. This transfer learning would be much faster than training it from scratch, as the weights would be close to optimal already. Our training would finetune them.

The below function was used with different torchvision models to see how well they perform.

# Takes the model and applies transfer learning
# Returns the model and an array of validation accuracies
def train_model(model, dataloaders, optimizer, criterion, scheduler, device, num_epochs=20):
startTime = time.time()
top_model = copy.deepcopy(model.state_dict())
top_acc = 0.0
for epoch in range(num_epochs): for data_split in ['Train', 'Validation']: if data_split == 'Train':
scheduler.step()
model.train()
else:
model.eval()
# Track running loss and correct predictions
running_loss = 0.0
running_correct = 0
# Move to device
for inputs, labels in dataloaders[data_split]:
inputs = inputs.to(device)
labels = labels.to(device)
# Zero out the gradient
optimizer.zero_grad()
# Forward pass
# Gradient is turned on for train
with torch.set_grad_enabled(data_split == "Train"):
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
if data_split == "Train":
loss.backward()
optimizer.step()
# Update running loss and correct predictions
running_loss += loss.item() * inputs.size(0)
running_correct += torch.sum (labels.data == preds)

epoch_loss = running_loss / dataloader_sizes[data_split]
epoch_acc = running_correct.double() / dataloader_sizes[data_split]
print('{} Loss: {:.2f}, Accuracy: {:.2f}'.format(data_split, epoch_loss, epoch_acc))
# If this the top model, deep copy it
if data_split == "Validation" and epoch_acc > top_acc:
top_acc = epoch_acc
top_model = copy.deepcopy(model.state_dict())
print("Highest validation accuracy: {:.2f}".format(top_acc)) # Load best model's weights
model.load_state_dict(top_model)
return model;

Since our focus was speed, we chose four of the smaller yet performant neural networks and achieved the below results:

Deep Learning Transfer Learning Model Accuracies
Transfer Learning Model Accuracies

Massive improvement! But could we do better?

Distillation

This was the golden step.

Deep Learning Distillation
Photo by Jan Ranft on Unsplash

Distillation is a bleeding-edge technique that trains smaller models to make faster predictions. It distills the knowledge in a network. This is perfect for our use case.

In distillation, you train a student from a teacher model. Instead of training your model on your data, you train it on the predictions of another model. As a result, you replicate the results with a smaller network.

Distillation can be challenging and resource-intensive to implement. Luckily, KD_Lib for PyTorch provides implementations of research papers accessible as a library. The below code snippet was used for vanilla distillation.

import torchimport torch.optim as optimfrom torchvision import datasets, transforms
from KD_Lib.KD import VanillaKD
# Define models
teacher_model = resnet
student_model = inception
# Define optimizers
teacher_optimizer = optim.SGD(teacher_model.parameters(), 0.01)
student_optimizer = optim.SGD(student_model.parameters(), 0.01)
# Perform distillation
distiller = VanillaKD(teacher_model, student_model, train_dataset_loader, val_dataset_loader,
teacher_optimizer, student_optimizer, device = 'cuda')
distiller.train_teacher(epochs=5, plot_losses=True, save_model=True)
distiller.train_student(epochs=5, plot_losses=True, save_model=True)
distiller.evaluate(teacher=False)
distiller.get_parameters()

Using vanilla distillation on our DenseNet model, we achieved 99.85% accuracy with our baseline CNN. The CNN outperformed every state-of-the-art model’s speed at 775 inferences/second on a V100. The CNN did this with <15% as many parameters.

Even more impressively, running distillation again continued to improve the accuracy. For example, results improved with a combination of NoisyTeacher, SelfTraining, and MessyCollab.

Side note: self-training uses your model as both the teacher and the student, how cool!

Closing Thoughts

Creating a classifier for proper mask usage felt both timely and urgent.

The lessons in building out this pipeline are useful for any image classification task you would like to do. Hopefully, this pipeline helps with understanding and solving any image classification task you aim to do. Good luck!

Associated Paper: https://arxiv.org/pdf/2105.01816.pdf

Associated Demo: https://www.youtube.com/watch?v=iyf4uRWgkaI

COVID-19 Mask Detection Image Classifier
Mask Classifier (Separate YOLOv5 Pipeline, In Paper)

--

--