Facial attribute detection using Deep learning

A quick 4 part walkthrough on doing real-time Multi-Facial attribute detection by using deep learning(ResNet50 with FastAI & Pytorch), Face detection and localization using Haar cascades(OpenCV).

Published in

Towards Data Science

9 min readFeb 17, 2019

The final output of the multi facial attribute detection project.

In this post, we are trying to achieve the above result. The post guides with an end to end process on how I went about building this. The entire codebase for replicating the project is in my GitHub repository.

Part 1 — Data Acquisition and Understanding

For any deep learning model to give reasonable accuracy, we need to rely on a large amount of labeled data. Most of the repo’s on facial feature detection I found are focused only on multi-class classification like Emotion detection, smile detection, etc. I was looking for a dataset with multiple labels attached to a facial image so that I can achieve something which a Google vision API achieve as below —

Example of a facial detection output from Google vision API

So for this purpose, I found a dataset on Kaggle dataset website called CelebFaces Attributes (CelebA) Dataset which contains -

202,599 number of face images of various celebrities
10,177 unique identities, but the names of celebrities are not given
40 binary attribute annotations per image
5 landmark locations

It’s a pretty decent size dataset for doing various exciting problems in computer vision, for my purpose I was only interested in facial images and the 40 binary attribute annotations of those images. The 40 binary attributes(Yes/No) are listed here — 5_o_Clock_Shadow, Arched_Eyebrows, Attractive, Bags_Under_Eyes, Bald, Bangs, Big_Lips, Big_Nose, Black_Hair, Blond_Hair, Blurry, Brown_Hair, Bushy_Eyebrows, Chubby, Double_Chin, Eyeglasses, Goatee, Gray_Hair, Heavy_Makeup, High_Cheekbones, Male, Mouth_Slightly_Open, Mustache, Narrow_Eyes, No_Beard, Oval_Face, Pale_Skin, Pointy_Nose, Receding_Hairline, Rosy_Cheeks, Sideburns, Smiling, Straight_Hair, Wavy_Hair, Wearing_Earrings, Wearing_Hat, Wearing_Lipstick, Wearing_Necklace, Wearing_Necktie, Young. Here is an example -

The above image has these features labeled — Arched_Eyebrows, Attractive, Big_Lips, Heavy_Makeup, Narrow_Eyes, No_Beard, Pointy_Nose, Wearing_Lipstick, Young. Because Male flag is False, we can say that the label is Female.

Part 2 — Data Preprocessing

The entire code for data pre-processing is in this notebook.

2.1) On Images -

Key thoughts I have when I was doing data processing for CelebA dataset was to think that how I am going to use the model built on a real video/webcam stream/image. CelebA data is tightly cropped around the face but in a video/webcam/image the face can be anywhere, and it has to be detected first. There are many prebuilt tools to localize a face in an image for example Face Recognition, which uses a deep learning network to detect a face. I wanted to keep this step simple, so I used Haar cascades, which is a traditional computer vision approach to detect objects. Haar cascade return the bounding box coordinates on an image where the face is detected, here is an example output of using Haar cascade -

To learn more about HAAR cascades refer to this blog. There are pre-built haar cascade filters in OpenCV. I am using one of them for frontal face detection. So once decided on the methodology for face detection, the next step is to apply the same method on the CelebA dataset to detect faces and crop only the facial area of an image(with some added margins), this step will help ensure that

We remove any faces where the frontal face is not detected using a Haar cascade for example cases where the person is facing sideways
It will ensure our training images are in line with the actual usage of the model

Example of Haar cascade processing on CelebA dataset

Notice in the above case the picture in the left is transformed to picture in the right(looks more zoomed-in) after haar cascade cropping. We also filtered down from 202,599 to 175,640 images as the filtered images don’t contain the front side faces. An example of a filtered image is shown below -

2.2) On Label file-

Apart from pre-processing on images, we need to create our label file which can be used by FastAI dataset loader.

In the original label file, the multi-attribute labels contain 1 /-1 value for every 40 attributes where 1 signifying if the feature is present and -1 meaning the absence of that feature. I just wrote a simple function to convert this file so that we only have one label column with space separated labels(figure below)-

Part 3 — Model Training

Once we have pre-processed our data, the next step is to build a model which can detect these 40+ attributes given a facial image. For this, we are going to use FastAI v1 library written over Pytorch 1.0. The model training notebook can be found on my Github here. I have divided the data in the training and validation set based on the recommended partitioning that image number from 1–182637 for training and 182638 onwards for validation. It’s incredibly easy to train world-class models with few lines of code in FastAI library, so let’s go through the code -

Boiler Plate library import commands —

import pandas as pd
import numpy as np
from fastai.vision import *
import matplotlib.pyplot as plt

Dataset Loading

path = Path('../data/celeba/faces/')
## Function to filter validation samples
def validation_func(x):
    return 'validation' in xtfms = get_transforms(do_flip=False, flip_vert=False, max_rotate=30, max_lighting=0.3)src = (ImageItemList.from_csv(path, csv_name='labels.csv')
       .split_by_valid_func(validation_func)
       .label_from_df(cols='tags',label_delim=' '))data = (src.transform(tfms, size=128)
       .databunch(bs=256).normalize(imagenet_stats))

Line 1 — Defining the path to the dataset folder.

Line 2–4 — Defining how we are going to find the training and validation image.

Line 6–7 — Define transformation we want to do on our data like rotating the image randomly by a maximum of 30 degrees and lighting adjustment of a max of 0.3.

Line 8 — Define images as Item list from the labels CSV

Line 9 — Split data in training and validation by using the validation function in Line 2–4

Line 10 — Helps get our label from the tags column of labels.csv and helps us define that it’s a multi-label column where a space separates labels.

Line 12 — Passing the transformation function from Line 6–7 and resizing images to 3*128*128

Line 13 — Defines our batch size of 256 images and normalize our data by using ImageNet averages

Notice that we are using a smaller image size for initial training of our model and later on we are going to increase the image size to 3*256*256. This trick helps us train our model faster by allowing bigger batch size and experiment more quickly on what model configuration works.

Model definition —

We are going to do transfer learning by using a pre-trained ResNet 50 model for this modeling exercise.

arch = models.resnet50
acc_02 = partial(accuracy_thresh, thresh=0.2)
acc_03 = partial(accuracy_thresh, thresh=0.3)
acc_04 = partial(accuracy_thresh, thresh=0.4)
acc_05 = partial(accuracy_thresh, thresh=0.5)
f_score = partial(fbeta, thresh=0.2)
learn = create_cnn(data, arch, metrics=[acc_02, acc_03, acc_04, acc_05, f_score])

Line1 — Downloads a pre-trained Resnet 50 model

Line 2–6 — In FastAI, we can track as many accuracy measures on validation data; these metrics are just for monitoring and are not used in training the model. We are using partial functions to define accuracy at a different threshold and also tracking the F-score at threshold 0.2

Line 7 — Helps in creating a CNN architecture by using the pre-trained convolution part of ResNet 50 model and adding two new Fully connected layers at the top.

The good thing about FastAI is that it saves a lot of time in training by finding an ideal learning rate for a particular exercise of interest.

learn.lr_find()
learn.recorder.plot()

Line 1 — Finds the ideal learning rate by trying multiple learning rates on a sample of data

Line 2 — Let us plot the loss at various learning rate.

We need to choose a learning rate with the maximum declining slope on the above function. In this case, it’s 1e-2.

lr = 1e-2
learn.fit_one_cycle(4, slice(lr))

Now we have trained our last fully connected layer a bit. Let us unfreeze all the layers and train the full model. We are going to use the learning rate finder to determine the ideal learning rate again.

learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

Snapshot of learning rate finder in FastAI

Line 1 — Unfree all the layers

Line 2–3 — Helps us find the ideal learning rate.

Now we will use different learning rate for each layer in the model by exponentially decaying the learning rate as we go back in layers.

learn.fit_one_cycle(5, slice(1e-5, lr/5))
learn.save('ff_stage-2-rn50')

Line 1 — Uses one cycle learning by the variable learning rate

Line 2 — Saves our model with the specified name.

Now we can increase the input image size to 3*256*256 and use transfer learning on the above-trained model to adjust to the new input image size.

data = (src.transform(tfms, size=256)
       .databunch(bs=64).normalize(imagenet_stats))acc_05 = partial(accuracy_thresh, thresh=0.5)
f_score = partial(fbeta, thresh=0.5)
learn = create_cnn(data, models.resnet50, pretrained=False,metrics=[acc_05, f_score])learn.load("ff_stage-2-rn50")

Line 1–2 — Create a new data loader which resizes images to 3*256*256 and reduces the batch size to 64.

Line 4–6 — Defines which metric we need to track and create a ResNet 50 model similar to the previous model.

Line 8 — Loads the weights from our previously trained model into the newly created model.

Now we can train the model a bit more following similar steps as mentioned above. The training notebook also provides codes to visualize activations of the intermediate layer to help understand what part of the image drives the final result of the model.

Models Intermediate Activation layers heatmap over the actual image.

As we can see from the image above the model is most activated where the face is in the image, which is what we want as it’s a facial feature detection model.

Part 4 — Combining everything

Now we have our model trained let’s write a script which can do facial attribute detection, the last part it to put it all together. The code for this part is on my Github here.

The script does the following task -

Using OpenCV to access webcam for taking the input video and converts into a series of image frames.
For each frame, we run the Haar cascade model from OpenCV to locate faces and crop it out from the frame.
Pass those cropped out frames of detected faces into our trained model to find relevant facial features
Display the bounding box, and all the features detected back on the frame while running the script
Optionally save the video stream

Conclusion

In the above blog, we saw how to do end to end facial attribute detection problem by combining various techniques from traditional machine vision to deep learning together.

I hope you enjoyed reading, and feel free to use my code on Github to try it out for your purposes. Also, if there is any feedback on code or just the blog post, feel free to reach out on LinkedIn or email me at aayushmnit@gmail.com. You can also follow me on Medium and Github for future blog post and exploration project codes I might write.