Which Mask Are You Wearing? Face Mask Type Detection with TensorFlow and Raspberry Pi

Yifan Wang
Towards Data Science
9 min readMay 29, 2020

--

Real-time face mask detection using a Raspberry Pi 4 shown in the right bottom corner. (Image by author)

How I built a real-time face mask type detector with TensorFlow and Raspberry Pi to tell whether a person is wearing a face mask and what type of mask they are wearing

The code and the Google Colab notebook that I used are available on GitHub.

Motivation

Mandatory face mask rules are becoming more common in public settings around the world [1]. There are growing scientific evidence supporting the effectiveness of face mask wearing on reducing the spread of the virus [2,3]. However, we have also seen some backlash on face masks, posing danger to people who are enforcing the rules. In some parts of the United States, they are often employees at the stores, as vulnerable as everyone else. So can we put AI to good use?

This motivates this work to develop a deep learning model which detects whether a person is wearing a face mask and what type of face mask the person is wearing. The type of face masks relates to the effectiveness of the mask [4]. Potentially, The model can be deployed at local supermarkets or school buildings to control the automatic door, which only opens to people wearing face masks.

As I completed the deep learning specialization on Coursera, I thought this is a perfect opportunity to gain some hands-on experiences on deep learning projects and face mask detection is a rather relevant topic. This post focuses on the development pipeline from data collection, to modeling training (using TensorFlow Object Detection API and Google Colab), and to deployment on a mobile device (a Raspberry Pi 4). This is a not a step-by-step tutorial, but I can point you to the helpful resources and provide some tips in designing your own projects.

Data Collection

This is an important but often overlooked step. As the saying goes, “garbage in, garbage out”, the data used for training should have a similar distribution to the target, i.e., real-life human faces with and without masks captured by webcams. The images from the video feed are often subject to the quality of the camera and different light settings. Therefore, our dataset should be diverse in terms of not only faces of different genders, age groups, races, with and without glasses, and also the image background.

I decided to train the model with 4 classes (types of face masks): homemade cloth covers, surgical mask, n95 mask and no mask. The corresponding labels are:

  • homemade
  • surgical
  • n95
  • bare

The images are collected from Google images and Upslash. I find the chrome plug-in called Fatkun that batch downloads images very helpful. However, the images should be carefully selected to ensure the quality and diversity of the dataset. There are no good answer to how many images are enough. For a small model, I tried to keep ~200 images per class and they should be representative of possible subclasses that the model will encounter. The examples are shown below:

  • homemade
Homemade masks including cloth masks or coverings. Note that some masks cover the entire neck. (Images from Upslash. From left to right by Sharon McCutcheon, source; LOGAN WEAVER, source; Zach Vessels, source)
  • surgical
Surgical masks of different colors. (Images from Upslash. From left to right by H Shaw, source; engin akyurt, source; Julian Wan, source)
  • n95
N95 masks of different shapes and colors. Both portrait and side shots are included. (Images from Upslash. From left to right by Jandro Saayman, source; Hiep Duong, source; Ashkan Forouzani, source)
  • bare
Bare faces of different genders, age groups, races and backgrounds. (Images from Upslash. From left to right by David Todd McCarty, source; Joseph Gonzalez, source; Tim Mossholder, source)

I collected 247, 197, 184 and 255 images for each class, which took ~5 hours in total, during which I have seen so many faces with masks. Unfortunately, the images for people wearing surgical and n95 masks were hard to find, especially in March (the early stage of the pandemic), when I was searching for them. Most of those images are from East Asian countries or stock photos of health workers.

Modeling Training

I used my Windows 10 laptop for data preprocessing, testing, and converting the model to the TensorFlow lite version. For model training, I used Google Colab with the free GPUs provided by Google.

Training an object detection model using TensorFlow object detection API can be divided to the following major steps:

  1. Set up the TensorFlow environment and workspace

This step can be tricky because Object detection API is not yet available for the latest TensorFlow version 2.x, therefore TensorFlow 1.x is required. A GPU version of TensorFlow is available if you want to utilize the CUDA cores on your local PC for speedup. I find these two tutorials are extremely helpful:

Some searches on Stackoverflow can fix the bugs and help you get TensorFlow running. Working inside a conda environment saves much work managing the messy dependencies. Also don’t forget to check the configuration compatibility before installing specific version of Python, TensorFlow or CUDA .

Compatible configurations for TensorFlow GPU, Python and CUDA (Image from TensorFlow)

The workspace on both my laptop and Google Drive has the following directory structure:

TensorFlow
├─ models
│ ├─ official
│ ├─ research
│ ├─ samples
│ └─ tutorials
└─ workspace
├─ preprocessing
├─ annotations
├─ images
│ ├─ test
│ └─ train
├─ pre-trained-model
├─ training
└─ trained-inference-graphs
├─ output_inference_graph_mobile.pb
└─ TFLite_model

2. Preprocess the images for training

First, the images needs to be annotated with labels. As you may notice, some images above have more than one person wearing face masks and some of images have complex backgrounds. Putting anchor boxes around those face masks helps the model learn faster by focusing on the local regions inside the boxes and picking up specific features. The process is rather tedious which took me ~4 hours to label ~800 images and I used a Python package called LabelImg. Essentially, what I did was to draw boxes around people’s masks (areas below their eyes and above their neck if they are not wearing one) and associate each box with a label.

Next, the dataset was split into a training and testing set with a ratio of 9 to 1. I did not set up a validation set since this is only a prototype model but it is always a good practice to do so in machine learning.

3. Create Label Map and TensorFlow Records

TensorFlow reads in a label map saved in label_map.pbtxt, which is a mapping between class labels to integer values. In this case, the file looks like:

item {
id: 1
name: 'homemade'
}
item {
id: 2
name: 'surgical'
}
item {
id: 3
name: 'n95'
}
item {
id: 4
name: 'bare'
}

The annotations need to be converted to TFRecord files as test.record and train.record.

4. Choose a model and start training

Deep learning models have millions of parameters and training them from scratch often requires large amounts of data which we don’t have here. One useful technique is transfer learning that takes a pre-trained model trained on other sets of images and reuses it on a new task (an example shown here). The majority of the parameters are fixed while some in the top layers are fine-tuned for the new task.

Since I want to use the model on a mobile device with limited computational power, the speed is the priority. A set of pre-trained models can be found in the table of TensorFlow’s detection model zoo. I chose a lightweight model, ssd_mobilenet_v2_coco[5], to balance the trade-off between speed and accuracy.

I tried training the model both on my laptop and Google Colab. I would definitely recommend using Colab if you don’t have a super fast GPU. Training such a small model for ~20K steps took ~10 hours on my laptop with a GeForce MX250 GPU and ~ 2 hours on Colab. That was 5 times speedup and at the meantime I could work on my laptop for other things. The Colab Notebook used for training can be found on this GitHub repository.

A cool thing about TensorFlow is that you can monitor the metrics (loss and accuracy) during the training use TensorBoard. As you can see the loss gradually decreases and plateaus, I decided not to wait any longer and killed the training after 20K steps. Another cool feature is that you can always resume the training from where you leave off later on.

TensorBoard example. (Image by author)

5. Export the model and test it

The model was exported to a frozen inference graph. It can be tested by running model_test_webcam.py.

Initially, I didn’t include a class for no mask (bare) faces since I thought the model would not draw an anchor box on the face and that indicates the person was not wearing a mask. The outcome proved that I was wrong. It did not learn any features of bare faces and recognized a face as a homemade mask. Therefore, I retrained the model with a new class and bare face dataset. What I learned from this process is the importance of dataset design, and the model can only learn what you ask it to learn. It is also wise to build a model fast initially and then iterate.

Classification outcome of the initial model with no ‘bare’ class. (Image by author)

I validated the new model on more images from the Internet (not in the training/testing set) and on a friend. Now it can recognize bare faces!

Images for model validation from Upslash. (In the top row, from left to right, images by Natalia Barros, source; Press Features, source; Jackson Simmer, source. In the bottom row, from left to right, images by H Shaw, source; Nathan Dumlao, source; Brian Wangenheim, source)
Testing with a friend in low light. All masks were labelled correctly! (Image by author)

6. Export the model to Raspberry Pi

Raspberry Pi 4 and a camera module (Images by Raspberrypi, source: pi 4, camera module v2)

Raspberry Pi is a credit card size mini computer and it is very affordable (4th generation costs ~$35). With TensorFlow support, it is the perfect mobile device to deploy the model on. Aside from the Raspberry Pi, a camera module (~$30) needs to be purchased separately.

For computational speed and privacy reasons, TensorFlow Lite models are required for mobile devices like a Raspberry Pi. This tutorial covers how to convert a trained TensorFlow inference graph to a Lite model on Window PC and run it on a Raspberry Pi:

Finally, time to play with the model!

The final outcome is shown in the GIF at the beginning and also the pictures below. The overall model performance seems great. However, a ~2.5 Frame per second (FPS) was not too bad but not fast either, even if I chose the light SSD Mobilenet. Further speedup can be achieved by choosing a ssdlite or quantized model.

Test images from a Raspberry Pi camera. (Image by author)

Conclusion and takeaways

After building a object detection model myself, I find structuring a deep learning project is definitely more than parameter tuning and designing the neural network architecture.

The most tedious and time-consuming part was collecting and preprocessing the data. Setting up TensorFlow environment can also be tricky, which hopefully will be resolved in the near future as TensorFlow becomes more automated. The main takeaways are:

  • The real-life data are complex. We should select the representative images of each class
  • Transfer learning is useful when we have a small dataset
  • We should stick to the guideline of fast prototyping and refining the model iteratively
  • And finally, without doubt, wear a mask at public places no matter which kind

Thanks for reading and stay healthy!

Reference

[1]Chappell, B. (2020 May 28). No Mask — No Entry,’ Cuomo Says As He Allows Businesses To Insist On Face Coverings. NPR

[2] Prather, A. K., et al. (2020 May 27). Reducing transmission of SARS-CoV-2. Science

[3] Howard, J.; Huang, A.; Li, Z.; Tufekci, Z.; Zdimal, V.; van der Westhuizen, H.; von Delft, A.; Price, A.; Fridman, L.; Tang, L.; Tang, V.; Watson, G.L.; Bax, C.E.; Shaikh, R.; Questier, F.; Hernandez, D.; Chu, L.F.; Ramirez, C.M.; Rimoin, A.W. Face Masks Against COVID-19: An Evidence Review. Preprints 2020, 2020040203 (doi: 10.20944/preprints202004.0203.v1).

[4] Parker-Pope, T., et al. (2020, April 17). Coronavirus: Which Mask Should You Wear?.The New York Times

[5] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. SSD: Single shot multibox detector. arXiv 2016. arXiv preprint arXiv:1512.02325.

--

--

PhD student in computational chemical engineering. Engineer, coder, runner