The world’s leading publication for data science, AI, and ML professionals.

Detecting Mask On/Off Or Incorrectly Worn Using Yolo-v4

The past two years has been surreal to say the least for most of us. These days you look around and regardless of which part of the world…

Utilizing computer vision to solve current real-world problems.

Photo by Yoav Aziz on Unsplash
Photo by Yoav Aziz on Unsplash

The past two years has been surreal to say the least for most of us. These days you look around and regardless of which part of the world you are at, chances are that you will see someone wearing a mask.

Particularly, if you are from cities or countries with strict mask mandates, you’re likely to see most people wearing mask, few will wear them incorrectly and some will not wear at all. Instead of having a unit of authority to police the area, I wanted to see if an machine vision algorithm can perform that exact same job.

Github Repo: https://github.com/kmt112/probable-lamp

Dataset references:

Original mask dataset: https://www.kaggle.com/andrewmvd/face-mask-detection

Additional mask dataset: https://github.com/cabani/MaskedFace-Net

Problem statement

First, since the machine vision algorithm has to be able to predict in real time settings, we will need one that can work in relatively low FPS environment. Secondly, the model has to react fast since people usually outpace the camera field of vision relatively quickly. Since it will have to predict all three classes (Mask Worn, Mask Off, and Mask Worn incorrectly) accurately. It will be better to evaluate the model based on mAP (mean Average Precision).

Yolo-v4

With the above problem statement this makes yolov4 the ideal algorithm. Yolov4 has consistently higher average precision at real-time detection(FPS: > 45) in addition yolov4 is a one-stage object detector thus making it computationally lighter. For more information on YoloV4 performance refer to https://blog.roboflow.com/pp-yolo-beats-yolov4-object-detection/.

Dataset

The dataset used initially was from Kaggle. The dataset consists 800+ labelled photos. The photo dataset also consists of group pictures and individual pictures as well. However, initial Exploratory Data Analysis shows that there is a huge class imbalance between the different classes. For the dataset of mask worn incorrectly. Therefore, there is a need to increase the class of mask_weared incorrectly.

Figure 1. Class imbalance (Image by Author)
Figure 1. Class imbalance (Image by Author)

Improving dataset

Huge Class imbalance can cause your models to perform poorly in under-represented classes, and that’s why it is important to have an equal representation of different classes during training. Since yolo v4 model only takes in photos with the appropriate labelling format, it is necessary to transform the XML file format to TXT file format. in addition, the images are also normalize to make it more robust against changes in image resolution.

Adding new data. It is easy to detect between people who wear mask and those who don’t. However, it is much more difficult to determine whether masks are worn incorrectly. Therefore, it is more important to include more data on mask worn incorrectly. While there are many available dataset that can be found in the internet. I used the dataset from dataset MaskedFace-Net [1,2].

Figure 2. Sample data from Masked-net dataset (left), Sample of masked and mask not worn properly Image from kaggle (right)
Figure 2. Sample data from Masked-net dataset (left), Sample of masked and mask not worn properly Image from kaggle (right)

Labelling new data. Unfortunately there is no way to automate this process, as I wanted the data to be clean I had to physically label the data and the bounding boxes. The bounding boxes and the classes are manually labelled using this python program.


Data modelling

We will utilize pretrained weights from COCO dataset for transfer learning. This is only used in the first iteration, thereafter we will use the pretrained weights that you have saved at each interval. Since yolo V4 is written in C and Cuda, in order to tune the hyperparameters it would have to be done in the configuration file. I have uploaded a configuration file that can be used, I will briefly explain the changes that I have made to improve the mAP.

  • width/height ​: Changed the resolution size to 416, increasing the width and height of yolov4 improves the resolution.
  • batches ​: when batches are divided by subdivision, this determines the number of images that will be processes in parallel.
  • saturation = 1.5, Hue = 1.5 ​: Changes the saturation and hue
  • mosaic = 1 ​: Mosaic data augmentation combines 4 training images into one in certain ratios (instead of two in cutmix). This prevents over-reliance on any key features.
  • blur = 1 ​: blur will be applied randomly in 50% of the time.
  • jitter = 0.3​: randomly changes size of image and its aspect ratio.

Image augmentation can create new training examples out of existing training data, therefore reducing the need for gathering more data. It also makes the model more robust to perturbation in the images.


Results

In order to show the difference between an imbalance data and properly represented data, I have trained the models for both of them. First, the models are compared across mAP.

Mean Average Precision

Figure 3. Train on unbalanced data (left), Train on balanced data (right), Image by Author
Figure 3. Train on unbalanced data (left), Train on balanced data (right), Image by Author

There isn’t much difference between model performance between both dataset. In fact, the model with more balanced data performed slightly poorer in the initial iterations.

IOU threshold vs mAP

Figure 4. IOU threshold vs mAP, Image by Author
Figure 4. IOU threshold vs mAP, Image by Author

One important difference between Intersection over Union and mAP. Perhaps at this point it would be worth while to explain what IOU and mAP is, and their relationship. Generally speaking, the mAP and IoU are inversely related. This is because IoU can be though of the tightness of bounding box.

Figure 5. IoU Illustration, Image from https://www.mdpi.com/2073-8994/13/2/262/htm
Figure 5. IoU Illustration, Image from https://www.mdpi.com/2073-8994/13/2/262/htm

If you set a high IoU threshold, you model would have to define a very accurate bounding box. This can be seen in Figure 5, where IoU was set at 0.9, the mAP significantly decreases.

Visualization of results

Here are some examples of how the model performed when the different classes were balanced.

Figure 6. Old model (left) & New Model (right), Image by Author
Figure 6. Old model (left) & New Model (right), Image by Author
Figure 7. Old model (left) & New model (right), Image by Author
Figure 7. Old model (left) & New model (right), Image by Author

When more mask worn incorrectly photos were added into the training model, the model performed significantly better in detecting if mask was worn properly. This is also evident in Figure 5 when comparing old and new model.

Quickly deployment time

The model also had no difficulty in handling high number of classes in a single frame.

Figure 8. Prediction across many people, Image from Kaggle
Figure 8. Prediction across many people, Image from Kaggle

Conclusion

Generally, Yolo-V4 performed better than expected, not only were the results great but they were also quick to deploy and validate. While my intial plan was to deploy this in real life, my local public transport provider beat me to the chase and had a CV solution deployed within bus and train stops.

Looking forward, one issue I faced with yolo-v4 was my unfamiliarity with the configuration file, C and Cuda programming language. Being more familiar with python and pytorch, I would think that yolo-v5 would allow me to better understand and therefore tweak the parameters to improve my model.

That being said, this tutorial was heavily referenced from the original darknet github repo https://github.com/pjreddie/darknet.

References

  1. Adnane Cabani, Karim Hammoudi, Halim Benhabiles, and Mahmoud Melkemi, "MaskedFace-Net – A dataset of correctly/incorrectly masked face images in the context of Covid-19", Smart Health, ISSN 2352–6483, Elsevier, 2020, DOI:10.1016/j.smhl.2020.100144
  2. Karim Hammoudi, Adnane Cabani, Halim Benhabiles, and Mahmoud Melkemi,"Validating the correct wearing of protection mask by taking a selfie: design of a mobile application "CheckYourMask" to limit the spread of COVID-19", CMES-Computer Modeling in Engineering & Sciences, Vol.124, №3, pp. 1049–1059, 2020, DOI:10.32604/cmes.2020.011663

Related Articles