Object Detection: Stopping Karens Before They Can Strike With Keras and OpenCV

Walkthrough on how to build your own facial detection system that can determine whether someone is wearing a mask, or if they are happy, neutral, or angry.

Published in

Towards Data Science

7 min readOct 14, 2020

In the age of COVID-19, we have seen that wearing masks can substantially decrease the spread of the virus. However, as we sometimes see online, there are a lot of people that strongly disagree with this statement. Videos online show people becoming very disgruntled when they are asked to follow this protocol. In this project, I will show you how to create a neural network that can detect whether someone is wearing a mask, and if they are not, we will detect the facial expression on their face. With more scale, this can be applied to local businesses so that managers can detect whether someone is wearing a mask and whether they are indeed a Karen ready to strike.

Photo From Giphy

**The code for this tutorial can be found on my GitHub**

Requirements

Keras
OpenCV
NumPy
Matplotlib
tqdm
Sklearn

Data

Face Mask: ~12K Images Dataset (Kaggle)
Emotion: ~30K Images Dataset (Kaggle)

Mask Detection

The first step to this project is to build a neural network that can detect whether a person is wearing a mask. For this portion, we will use Mobilenet.

Before we build the model, we need to extract each image and preprocess it so that it can be fed into Mobilenet. In the dataset above, we see that there are three directories in the outermost layer: (1) Train, (2) Test, and (3) Validation. Below is the code for how to extract each image, preprocess them for mobilenet, and save them to a NumPy array.

After running the cell above, you should see a window like this.

In the later sections, we will reuse the function get_image_value, which is why we pass a parameter stating which model type it should retrieve the image for.

Now that we have the image values, it is time to build the neural network that will be used to detect the mask. Make sure you have a folder called ModelWeights inside the current directory. In this folder, we will save the weights for each model that we train.

Above, we specified that the model should train for 2000 epochs. Because the model likely does not need 2000 epochs, we included an early stopping parameter to prevent the model from overfitting after 5 consecutive iterations with no change in the loss. After running the cell above, you should see a window like this:

After the training is complete, you should see a .h5 file within the ModelWeights folder named Mobilenet_Masks.h5. In this file, the weights for the model are stored and will be used later on when we apply this neural network to live video.

Above, we see the ROC score, confusion matrix, and loss/accuracy for the mobilenet training. Considering the amount of data we used, these metrics are pretty good. For the validation set, only 13 images were incorrectly classified. As for the loss and accuracy, the loss was able to go below .1 and the accuracy was well above 94%. Finally, the ROC score shows great success as each class had a perfect score of 1.0, while F1 scores for each class were greater than .98.

Emotion Detection

Like the last model, we must first start by extracting the image values and placing them into a NumPy array. Like I mentioned earlier, we will reuse the get_image_value function within a new function designed to extract only the emotion images. The dataset contains 7 classes: angry, happy, neutral, sad, disgust, fear, and surprise. For this project, we will only be focusing on the first 3 classes: angry, happy, and neutral. Also, the model that we will use for training will take an input image with size (48,48,3), which is much smaller than mobilenet dimensions of (224,224,3).

Emotion Train Test Split (Photo by Author)

As you can see above, we limited each class to include only a maximum of 4000 images. This was performed so that training would be faster and so that we could properly track the performance with equal class balance.

After running the code above, you should see a similar window as before that looks like this:

Emotion Train Test Split (Photo by Author)

Now that we have the train test split arrays, we will build the neural network that will detect the emotion on a person’s face. Below is the code for how to build this neural network. Unlike mobilenet, we will not apply augmentation to the dataset.

Emotion Training (Photo by Author)

After running the code above, you should see a window like this:

Emotion Training (Photo by Author)

Once the training is complete, you should find a .h5 file called Normal_Emotions.h5 which will contain the weights for the model. Like the previous model we trained, this will be used for the live video portion below.

Above, we see the ROC score, confusion matrix, and loss/accuracy for the emotion model. Considering the amount of data we used, these metrics are pretty good, but not as good as mobilenet. For the training set, only 1,786 images were incorrectly classified out of the ~12,000. As for the loss and accuracy, the loss was able to go below .7 and the accuracy stayed between 70–75%. Finally, the ROC score shows pretty good success as each class maintained a score greater than .9, while F1 scores for each class were between .7 and .9.

Deployment

Now that we have each model trained, we will apply them both to live video using OpenCV. For this portion, you must have a webcam set up on you local machine.

Before we get into the code, let me first explain the steps for how this work.

Using haar cascade classifier, we ill find the coordinates of a face within a video frame (download link)
After finding the coordinates of the face, we will extract that portion, which is also called our region of interest (ROI).
Because we want to figure out if there is a mask present, we will first reshape that ROI to (224,224) so that it can be fed into our mobilenet model.
If the mobilenet model predicts there is a mask, then it will continue without using the emotion model.
However, if the mobilenet model predicts there is no mask, then it will use the ROI again and input it into the emotion model.

Below is the code for how it would work. At first glance, it seems like there are a lot of conditions within the for loop, but just read it slowly and it will make sense.

Once you run the code above, a window should pop up like the one below. To Stop this process, press the ‘q’ button on your keyboard.

Limitations

From what I can tell, this model has a hard time distinguishing whether a mask is present if the person in the frame is wearing glasses.
More data containing people wearing masks with glasses would improve this limitation

Future Directions

I plan to find a camera that can detect heat (if I can find one that is affordable)
Using a camera to detect heat as well as these other features can be beneficial to prevent people from entering stores if they are found to be experiencing a fever.

Also, I am currently looking for a job in Data Science so if you have any advice on that front please reach out to me via LinkedIn!

*The code for this tutorial can be found on my GitHub**