Facial Data-Based Deep Learning: Emotion, Age and Gender Prediction

How can we draw information from facial data using deep learning?

Published in

Towards Data Science

9 min readJul 25, 2020

Deep Learning has found huge applications in the fields of Computer vision. Some of the most important applications of computer vision are in the fields that deal with facial data. Face Detection and recognition are being widely used in security-based applications. If you want to explore these two areas feel free to go through:

Face Detection: In this article, I have talked about an application based on face detection, and
Face Recognition: This article talks about how we can implement a security mechanism using face recognition.

I have tried to give a total explanation of how the mechanisms work in the above articles.

In this article, we are going to talk about two of the other most important applications of face-based deep learning, emotion detection, or facial expression detection, and age and gender prediction from a facial image.

So, let’s jump right in.

Emotion Detection

First, let us talk about Emotion detection or prediction.

For this part, we will be using Kaggle’s CKPlus Dataset.

Data Preprocessing

The dataset has 981 images in total. These images are classified into seven labels based on seven different expressions: Anger, Contempt, Disgust, Fear, Happy, Sadness, and Surprise. Each expression has a folder. So, there are seven folders in which the 981 images are stored.

First, we list the classes or folders in the order using listdir().

import os
files=os.listdir(fldr)
>>
['fear', 'contempt', 'happy', 'anger', 'surprise', 'disgust', 'sadness']Exp=['fear', 'contempt', 'happy', 'anger', 'surprise', 'disgust', 'sadness']

So, we save the expression in the order in a list which we will refer to create the labels.

Next, we move to preprocess the images.

import cv2
from google.colab.patches import cv2_imshow
i=0
last=[]
images=[]
labels=[]
for fle in files:
  idx=Exp.index(fle)
  label=idx
  
  total=fldr+'/'+fle
  files_exp= os.listdir(total)  for fle_2 in files_exp:
    file_main=total+'/'+fle_2
    print(file_main+"   "+str(label))
    image= cv2.imread(file_main)    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image= cv2.resize(image,(48,48))
    images.append(image)
    labels.append(label)
    i+=1
  last.append(i)

The above snippet opens the image reads it using OpenCV, resizes it to (48 x 48) dimension. I have converted it to an RGB image, so, it is having three channels. The size of each image is (48 x 48 x 3). We will append the images in the ‘images’ list, and the corresponding labels in the ‘labels’ list. Let’s visualize a few examples after preprocessing.

import tensorflow as tf
from sklearn.model_selection import train_test_split
import numpy as np
images_f=np.array(images)
labels_f=np.array(labels)images_f_2=images_f/255
labels_encoded=tf.keras.utils.to_categorical(labels_f,num_classes=num_of_classes)
X_train, X_test, Y_train, Y_test= train_test_split(images_f_2, labels_encoded,test_size=0.25)

Using the above snippet, we convert the labels and images to NumPy arrays and normalize the images array by dividing it with 255. I have used one-hot encoding to encode the labels into vectors. We will be using a test split of 25%.

Model

from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten,BatchNormalization
from tensorflow.keras.layers import Dense, MaxPooling2D,Conv2D
from tensorflow.keras.layers import Input,Activation,Add
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adamdef Convolution(input_tensor,filters):
    
    x = Conv2D(filters=filters,kernel_size=(3, 3),padding = 'same',strides=(1, 1),kernel_regularizer=l2(0.001))(input_tensor)
    x = Dropout(0.1)(x)
    x= Activation('relu')(x)    return x
def model(input_shape):
  inputs = Input((input_shape))
  
  conv_1= Convolution(inputs,32)
  maxp_1 = MaxPooling2D(pool_size = (2,2)) (conv_1)
  conv_2 = Convolution(maxp_1,64)
  maxp_2 = MaxPooling2D(pool_size = (2, 2)) (conv_2)
  conv_3 = Convolution(maxp_2,128)
  maxp_3 = MaxPooling2D(pool_size = (2, 2)) (conv_3)
  conv_4 = Convolution(maxp_3,256)
  maxp_4 = MaxPooling2D(pool_size = (2, 2)) (conv_4)
  flatten= Flatten() (maxp_4)
  dense_1= Dense(128,activation='relu')(flatten)
  drop_1=Dropout(0.2)(dense_1)
  output= Dense(7,activation="sigmoid")(drop_1)  model = Model(inputs=[inputs], outputs=[output])  model.compile(loss="categorical_crossentropy", optimizer="Adam",
	metrics=["accuracy"])
  return model

We are going to use the above model to predict the expressions.

History=Model.fit(X_train,Y_train,batch_size=32,validation_data=(X_test,Y_test),epochs=1000,callbacks=[callback_list])

The above snippet will train the model.

The model gives an accuracy of 100% on the test data.

Note: Whenever a model gives a 100% accuracy on test data, we need to check the training accuracy, if that is also 100%. It means the model is actually overfitting and the test set is having a very close distribution to the train set. So, it is showing great results. I think in these circumstances, it’s better to use cross-validation to get the correct intuition of how the model actually works.

Let’s continue with the evaluation.

Evaluation

The two curves show the learning of the model. The first curve shows the loss function decrease and the second shows the accuracy growth with epochs.

Pred=Model.predict(X_test)
Pred
>>
array([[1.68134073e-09, 5.25928086e-11, 5.46700324e-11, ...,         7.71565616e-01, 8.71616357e-05, 4.54742303e-06],        [5.06911943e-11, 5.20724059e-17, 2.85400745e-07, ...,         2.65912314e-12, 7.78120279e-01, 2.07833662e-14],        [5.95332267e-07, 7.41830490e-07, 1.73864496e-08, ...,         4.54492539e-01, 9.06203127e-07, 1.08237209e-05],        ...,        [1.56573861e-07, 3.44979071e-07, 3.86641860e-01, ...,         3.84031367e-08, 4.99448021e-08, 6.93729362e-13],        [1.91495033e-07, 7.53485918e-01, 1.24115175e-07, ...,         2.53645931e-06, 6.98523905e-09, 2.22882386e-06],        [5.07813091e-14, 1.79454021e-12, 1.35435105e-14, ...,         9.94049311e-01, 2.74002265e-09, 1.31444740e-08]], dtype=float32)

We obtain predictions as shown, Let us check the classification report and Confusion Matrix.

Classification report:

i=0 Y_test_l=[] 
Pred_l=[] 
while(i<len(Pred)):   
  Y_test_l.append(int(np.argmax(Y_test[i])))     
  Pred_l.append(int(np.argmax(Pred[i])))   
  i+=1report=classification_report(Y_test_l, Pred_l)

Confusion Matrix:

We have seen the confusion matrix and classification report for our model.

def test_image(ind,images_f,images_f_2,Model):
  cv2_imshow(images_f[ind])
  image_test=images_f_2[ind]
  print("Label actual:  " + Exp[labels[ind]]  )
  pred_1=Model.predict(np.array([image_test]))
  #print(pred_1)
  pred_class=Exp[int(np.argmax(pred_1))]
  print("Predicted Label: "+ pred_class)

The above snippet will let us have a look at some of the images, their true labels, and the predicted labels.

So, we have seen how to predict Emotion using Deep Learning. Let’s check out Age and Gender Prediction.

Age and Gender Prediction

We will use Kaggle’s UTKFace Dataset for predicting age and Gender.

Data Preprocessing

Here I have used the dataset having 9780 files. It has 9780 images of faces belonging to both males and females with ages ranging from 0 to 116. Each image has labels that show the corresponding age and gender. Male is given by 0 and Female is given by 1.

import cv2
ages=[]
genders=[]
images=[]for fle in files:
  age=int(fle.split('_')[0])
  gender=int(fle.split('_')[1])
  total=fldr+'/'+fle
  print(total)
  image=cv2.imread(total)  image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
  image= cv2.resize(image,(48,48))
  images.append(image)
  ages.append(age)
  genders.append(gender)

The above snippet helps to get the data and prepare the training sets. The ‘images’ list contains all the 9780 images, each image of size (48 x 48 x 3). The ‘ages’ list has the corresponding ages and the ‘genders’ list has the corresponding genders.

Let us look at the images after preprocessing.

The 1st image has ‘age: 62 and gender: 0’ and the second image has ‘age: 67 and gender:0

Now, we need to check the distribution of our sets.

The first bar graph shows the distribution of gender. It seems well balanced. The second line graph shows the variation of samples of different ages. We can see that the samples with age less than 40 is much more than the number of samples with age more than 40. This creates a skewness in the train set distribution.

We have seen in this case, we will actually need to predict both age and gender using the same model. So, to create the actual labels for our training set, we will need to do some processing.

labels=[]i=0
while i<len(ages):
  label=[]
  label.append([ages[i]])
  label.append([genders[i]])
  labels.append(label)
  i+=1

The above snippet takes the age and gender for each image sample index wise and converts each one into a list and appends them to the labels list. This is done to create the one-dimensional label vectors.

So, the shape of the ‘labels’ list will be:

[[[age(1)],[gender(1)]],
[[age(2)],[gender(2)]], ……………….[[age(n)],[gender(n)]]]

Next, we convert the labels and images list into NumPy arrays, normalize the images, and create the training and test data splits. We will use a 25% test split.

images_f=np.array(images)
labels_f=np.array(labels)
images_f_2=images_f/255
X_train, X_test, Y_train, Y_test= train_test_split(images_f_2, labels_f,test_size=0.25)

Currently, our Y_train and Y_test are of the form:

Y_train[0:5]
>>array([[[36],
        [ 0]],       [[50],
        [ 0]],       [[65],
        [ 0]],       [[ 3],
        [ 0]],       [[25],
        [ 1]]])

We need to transform them in a way such that Y_train[0] denotes the gender labels vector, and Y_train[1] denotes the age labels vector.

Y_train_2=[Y_train[:,1],Y_train[:,0]]
Y_test_2=[Y_test[:,1],Y_test[:,0]]

The simple snippet does the work for us.

Y_train_2[0][0:5]
>>array([[0],
       [0],
       [0],
       [0],
       [1]])
Y_train_2[1][0:5]
>>array([[36],
       [50],
       [65],
       [ 3],
       [25]])

Now, we are ready to proceed and design our model.

Model

from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten,BatchNormalization
from tensorflow.keras.layers import Dense, MaxPooling2D,Conv2D
from tensorflow.keras.layers import Input,Activation,Add
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
import tensorflow as tfdef Convolution(input_tensor,filters):
    
    x = Conv2D(filters=filters,kernel_size=(3, 3),padding = 'same',strides=(1, 1),kernel_regularizer=l2(0.001))(input_tensor)
    x = Dropout(0.1)(x)
    x= Activation('relu')(x)    return x
def model(input_shape):
  inputs = Input((input_shape))
  
  conv_1= Convolution(inputs,32)
  maxp_1 = MaxPooling2D(pool_size = (2,2)) (conv_1)
  conv_2 = Convolution(maxp_1,64)
  maxp_2 = MaxPooling2D(pool_size = (2, 2)) (conv_2)
  conv_3 = Convolution(maxp_2,128)
  maxp_3 = MaxPooling2D(pool_size = (2, 2)) (conv_3)
  conv_4 = Convolution(maxp_3,256)
  maxp_4 = MaxPooling2D(pool_size = (2, 2)) (conv_4)
  flatten= Flatten() (maxp_4)
  dense_1= Dense(64,activation='relu')(flatten)
  dense_2= Dense(64,activation='relu')(flatten)
  drop_1=Dropout(0.2)(dense_1)
  drop_2=Dropout(0.2)(dense_2)
  output_1= Dense(1,activation="sigmoid",name='sex_out')(drop_1)
  output_2= Dense(1,activation="relu",name='age_out')(drop_2)
  model = Model(inputs=[inputs], outputs=[output_1,output_2])
  model.compile(loss=["binary_crossentropy","mae"], optimizer="Adam",
	metrics=["accuracy"])
  
  return model

We are going to use the above model to predict both the sex and age

The above is a schematic diagram of our model. After the ‘flatten’ layer we are going to use two different dense layers and dropouts corresponding to the corresponding outputs. Now, gender prediction is a classification problem, while age prediction is a regression problem, so, we will use sigmoid as output activation for gender prediction and ReLU as the activation function for Age prediction. Similarly, we will use ‘binary cross-entropy’ as the loss function for gender and ‘mean absolute error’ as the loss function for the age.

from tensorflow.keras.callbacks import ModelCheckpoint import tensorflow as tf
fle_s='Age_sex_detection.h5'
checkpointer = ModelCheckpoint(fle_s, monitor='val_loss',verbose=1,save_best_only=True,save_weights_only=False, mode='auto',save_freq='epoch')
Early_stop=tf.keras.callbacks.EarlyStopping(patience=75, monitor='val_loss',restore_best_weights=True),
callback_list=[checkpointer,Early_stop]
History=Model.fit(X_train,Y_train_2,batch_size=64,validation_data=(X_test,Y_test_2),epochs=500,callbacks=[callback_list])

The above snippet will train the model.

The model gives an accuracy of 82% for the gender classification.

Evaluation

Let us look at the model’s loss curve.

This is the generated loss curve for our model.

Let’s look at the evaluation for age prediction:

The above curve shows the model traced linear regression line in black and the blue dots show the distribution of test samples. So, we can see our predicted line almost passes through the middle of the distribution. Above the age of 80, there were very few samples, so, maybe owing to that there our model didn’t perform so well.

Evaluation for gender prediction:

The above curve shows the increase in gender accuracy with epochs.

Classification Matrix for gender classification:

Our model obtained an F1 score of 0.84 for the female gender and 0.78 for Male gender. So, it classifies female gender better than males.

Let’s look at some samples from our set and their corresponding predicted age and gender.

def test_image(ind,images_f,images_f_2,Model):       cv2_imshow(images_f[ind])   
image_test=images_f_2[ind]   pred_1=Model.predict(np.array([image_test]))  
#print(pred_1)   
sex_f=['Male','Female']   
age=int(np.round(pred_1[1][0]))   
sex=int(np.round(pred_1[0][0]))   
print("Predicted Age: "+ str(age))   
print("Predicted Sex: "+ sex_f[sex])

The above snippet helps to generate our samples: