The world’s leading publication for data science, AI, and ML professionals.

Eigenfaces – Face Classification in Python

Not enough data for Deep Learning? Try Eigenfaces.

Nowadays we can use neural networks to perform state of the art image classification, or face classification in this case. But what about taking a simpler approach? That’s what this article aims to cover.

Photo by x ) on Unsplash
Photo by x ) on Unsplash

Official repo: access it here to get data and code.

The idea of taking raw pixel values as input features might seem stupid at first – and it probably is, mostly because we’d lose all 2D information, and there are also convolutional neural networks to extract important features (as not all pixels are relevant).

Today we’ll introduce the idea of the Eigenfaces algorithm – which is simply a principal component analysis applied to face recognition problem. By doing so our hope is to reduce the dimensionality of the dataset, keeping only the components that explain the most variance, and then apply a simple classification algorithm (like SVM) to do the classification task.

Sounds like a plan, but what should you know before reading this article? It’s a good question. You should be proficient in Python and its data analysis libraries, and also know what principal component analysis is, at least on the high level.

Still reading? Then I guess you’ve got the prerequisites covered. The last thing we want to discuss before jumping into the code is the article structure, and it can be listed as follows:

  • Imports and Dataset Exploration
  • Image visualization
  • Principal Component Analysis
  • Model Training & Evaluation
  • Conclusion

Okay, without much ado, let’s get started!


Imports and Dataset Exploration

As you’ve probably expected, we’ll need the usual suspects – Numpy, Pandas, and Matplotlib, but will also use a bunch of stuff from ScikitLearn – like SVM, PCA, train test split, and some metrics for evaluating model performance.

Down below are all of the imports:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')

As for the dataset, we’ve found it a while back on GitHub but can’t seem to find it now. You can download it from my GitHub page.

And here’s how you’d load it into Pandas:

df = pd.read_csv('face_data.csv')
df.head()

Now we can quickly check for the shape of the dataset:

df.shape
>>> (400, 4097)

So, 400 rows and 4097 columns, a strange combination. For the columns we here have normalized pixels values (meaning values in the range (0, 1)), and by the end we have a target column, indicating which person is on the photo.

If we take a closer look at the number of unique elements of the target column, we’d get the total number of people in the dataset:

df['target'].nunique()
>>> 40

And since we have 4096 features, it’s a clear indicator of 64×64 images in a single color channel:

64 * 64
>>> 4096

Great, we now have some basic information about the dataset, and in the next section we will make some visualizations.


Image Visualization

To visualize a couple of faces we’ll declare a function which transforms 1D vector to a 2D matrix, and uses Matplotlib’s imshow functionality to show it as a grayscale image:

def plot_faces(pixels):
    fig, axes = plt.subplots(5, 5, figsize=(6, 6))
    for i, ax in enumerate(axes.flat):
        ax.imshow(np.array(pixels)[i].reshape(64, 64), cmap='gray')
    plt.show()

But before plotting, we need to separate features from the target, otherwise, our dataset will overflow the 64×64 matrix boundaries:

X = df.drop('target', axis=1)
y = df['target']

And that’s it, now we can use the declared function:

And that’s pretty much it for this section. In the next one we’ll perform the train test split and PCA.


Principal Component Analysis

The goal of this section is to reduce the dimensionality of our problem by keeping only those components that explain the most variance. That in a nutshell is a goal of PCA. But before doing so, we must split the dataset into training and testing portions:

X_train, X_test, y_train, y_test = train_test_split(X, y)

Now we can apply the PCA on the training features. Then it’s easy to plot the cumulative sum of the explained variance, so we can approximate how many principal components are enough:

pca = PCA().fit(X_train)
plt.figure(figsize=(18, 7))
plt.plot(pca.explained_variance_ratio_.cumsum(), lw=3)

Just by looking at the chart, it seems like around 100 principal components will keep around 95% of the variance, but let’s verify that claim:

np.where(pca.explained_variance_ratio_.cumsum() > 0.95)

Yes, it looks like 105 components will do the trick. Keep in mind that 95% isn’t set in stone, keep free to go for lower or higher percent on your own.

Let’s perform the PCA again, but this time with additional n_components argument:

pca = PCA(n_components=105).fit(X_train)

And finally, we must transform the training features:

X_train_pca = pca.transform(X_train)

Great! That’s it for this section, and in the next one we’ll train and evaluate the SVM model.


Model Training & Evaluation

By now have the training features transformed. The process of training the model is as simple as making an instance of it and fitting the training data:

classifier = SVC().fit(X_train_pca, y_train)

Awesome! Model is now trained, and to evaluate it on the test set we’ll first need to bring the test features to the same feature space. Once done, SVM is used to make predictions:

X_test_pca = pca.transform(X_test)
predictions = classifier.predict(X_test_pca)

And now we can finally see it’s performance. For this we’ll use the classification_report from ScikitLearn, as it’s easier to look at than 40×40 confusion matrix:

print(classification_report(y_test, predictions))

So around 90% accuracy, certainly not terrible for 40 different classes and default model.

And that does it for this article, let’s quickly glance at possible areas of improvement in the next section.


Conclusion

This was a rather quick guide – intentionally. You are free to perform a grid search to find optimal hyperparameters for the classifier or even to use a completely different algorithm.

Also, try opting for 90% and 99% of the explained variance ratio, to see how the model performance changes.

Thanks for reading, feel free to leave your thoughts in the comment section.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Related Articles