Making Sense of Big Data
I was (am still) running a few Deep Learning networks for face detection on my humble machine but I kept bumping into memory and CPU/GPU consumption issues. I can either reduce the number of images I am training or find a way to make my algorithm scale on my machine.
You can say, why not rent a high-end machine or buy some server time! Well, both these options are legit but maybe there is a way to reduce high dimensional image data or maybe not and I wanted to experiment a little before I shift direction.

It is relatively old and much used(in fact overused) piece of literature but in any case, I wanted to implement it(from scratch) https://www.mitpressjournals.org/doi/pdf/10.1162/jocn.1991.3.1.71
The steps were fairly simple:
1. Parse through esoteric and terse paper quickly.
2. Understand and draft an algorithm on the central idea.
3. Program it on a test data of your choice and check the quality of results.
1. Parsing
It took me around 50 minutes to parse through the paper and I came up with a crude and dirty step by step approach that I could comprehend and translate.
Oh, BTW do you know Eigen is a German word that means characteristics or inherent traits?

Please note that matrix multiplication involves a lot of check on dimensions i.e. the inner dimensions of the two matrices should be equal then and only then we can multiply those matrices, so many times we have to transpose the matrix to make the product work.
Ok, now I understand the basics of the paper and have synthesized the steps that I would be following in the code; I should summarise what the paper is saying.
2. Central idea/gist of the paper
The idea is that a large set of images can be represented in form of a matrix, A, of N x M dimension, where N is the number of images and M is the product of dimension of a single image i.e. c column, r rows then M = r * c.
Once matrix A is produced, we try to find the commonality between rows of the matrix and in other words we find the correlation between image 1 and image 2, image 1 and image 3, image 1 and image N and so on. This is achieved by calculating a covariance matrix, B.
Matrix B can then be decomposed into singular value matrices that are essentially the eigenvalues and eigenvectors. The eigenvectors matrix tries to find important information that is stored in matrix B i.e. there will be N eigenvectors and each of them will be capturing specific facial features of the images such as eyes, smile, glasses, ears, hair, and not only facial features but ambient features such as low light, darkness etc are also captured by the eigenvectors. Nth, (N-1)th vector and the ones at the far end will not be able to capture much information and thus they will be discarded while reconstructing the faces and in this fashion, this method helps in reducing the dimensions as well and helps in better memory usage.
Essentially, every image is a linear combination of these eigenvectors.
Image 1 = peigenvector_11 + qeigenvector_2 + …+ k*eigenvector_N
p, q, k are the weights which will vary for each image.
What are these weights, you ask?
Well, you can say that these are the contribution of each image towards the creation of one eigenvector. In the above example, Image 1 contributed an amount ‘p’ towards the creation of eigenvector_1, ‘q’ amount for eigenvector_2, and so on.
Thus, we are able to reconstruct the input images using the eigenvectors and that’s why these eigenvectors are called Eigenfaces in this paper. :)(It was a lightening moment of truth for me to go to the full circle from eigenvector to eigenfaces).
3. Enough talk, let’s code
But first, I need some data.
I will use Yale’s faces dataset which is available here. The images are in the /faces/ directory. There are 165 .pgm images(you can read them like any other .png or .jpg file) of 15 subjects in 11 different conditions i.e. each of the 15 people has 11 photos in smiling, sad, no glasses, different lights etc conditions. 1511 = 165, makes sense!
Load required libraries.
# Import important libraries
import os
import numpy as np
import matplotlib.cm as cm
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from numpy.linalg import eig
import glob
from os import path
%matplotlib inline
Define the path where we have kept the images
# Change it according to your directory
image_path = '../YALE/faces'
Let’s load the images and see what we are dealing with here.
filespatttern = '*.pgm'
pathPattern = path.join(image_path,filespatttern)
files = glob.glob(pathPattern)
len(files) #This will yield 165
images =[]
for image in files:
im = Image.open(image)
images.append(np.asarray(im))
print(images[0].shape)
#(243, 320)
The last statement above tells us the dimensions of the images, so we have 165 images of 243*320 shape. Please note that these are the greyscale images because the output was (243, 320) i.e. 2 dimensional. Had it been a coloured image then it would have 3 dimensions i.e. (243, 320, 3), 3 for RGB.
Flatten the images
Let’s go to the next step which is flattening each image. What it means is instead of the image being a matrix of 243 320 dimension, let’s create a single image vector from it which will have of course 243320 = 77760 components. For every 165 images, we will have one vector each of 77760 elements and we will have a matrix of 165*77760 dimension.
flattened_images = []
for i in range(0, len(images)):
flattened_images.append(np.reshape(images[i], (images[0].shape[0] * images[0].shape[1])))
flattened_images = np.asmatrix(flattened_images)
flattened_images.shape # This will yield (165, 77760)
Compute the mean face
Find the mean face and let’s see how it looks.
mean_face = np.mean(flattened_images, axis=0)
plt.subplot(121), plt.imshow(np.reshape(mean_face, (images[0].shape[0],images[0].shape[1])), cmap="gray"), plt.axis('off'), plt.title('Mean face')

This mean face is the average of all the faces that we are dealing with. Why we calculate the mean face is to subtract it from the flattened matrix. It does 2 things for us: first, it removes the common unwanted features and second, it helps in creating the mean centred data. We will do the same in the next step.
# Subtract average_face_vector from every image vector.
px_images = np.zeros(flattened_images.shape)
px_images = flattened_images - mean_face
_pximages is the important data as it is the data on which relies most of the further work.
This is a crucial step of the work as you can either use standard libraries such as SVD/PCA and be done with in a few minutes or go along with me. The goal was to implement the paper verbatim, so I am going to calculate everything that needs to be calculated.
Compute covariance matrix
Let’s find the covariance matrix of the above ‘px_images’.
# Calculate covariance matrix of above matrix -> C = A*transpose(A)
covariance_matrix = np.dot(px_images, px_images.T)
The covariance matrix, a matrix of 165*165, tells us how each image varies with the other.
A point to keep in mind is that matrix multiplications are expensive operations, px_images is of the shape (165, 77760)
and its transpose is ( 77760, 165)
and that’s why the covariance matrix is of size 165 * 165 and we could compute it in no time.
Imagine px_images was of ( 77760, 165)
dimension, its transpose would have been (165, 77760)
and the resultant covariance matrix of (77760, 77760)
. It would have been super slow and disastrous.
So, before running the covariance matrix code, make sure the first dimension of the px_images is smaller than the second one.
Let’s find the eigenvectors and eigenvalues of the covariance matrix.
val, vec = np.linalg.eigh(covariance_matrix)
# val is the eigenvalue, vec is the eigenvector
prod = np.dot(px_images.T, vec)
This matrix ‘prod’ is the Eigenfaces matrix
In the code above, vec is the eigenvector of Atranspose(A), where A is px_images. The way we have created the px_images, we want to find the eigenvectors of transpose(A)A. To find the same, we premultiply transpose of px_images with vec and thus, prod is the matrix that we are finally interested in (Please refer to page 5 of the research paper to understand more.)
Sort the eigenvectors
Now, we have the Eigenfaces matrix with us. Before we use this prod matrix, let’s sort it on the basis of highest eigenvalues.
idx = np . argsort ( - val )
val = val [ idx ]
prod = prod [: , idx ]
Also, normalise the values
for i in range(len(images)):
prod[:,i] = prod[:,i]/np.linalg.norm(prod[:,i])
Finally, we have the Eigenfaces
Now that the prod matrix is sorted, we can plot the eigenfaces like images.
def show_EigenVecs():
fig, axes = plt.subplots(11, 15, figsize=(15, 15), subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes.flat):
ax.imshow(np.reshape(np.array(prod)[:,i], (images[0].shape[0], images[0].shape[1])), cmap = 'gray')
show_EigenVecs()

These ghost-like images are the eigenfaces; you can see that the initial ones are trying to capture some effect or the other and the amount of info captured peter out in the later eigenfaces.
Find the weights and reconstruct the images from eigenfaces
weights = np.dot(px_images, prod)
These are the weights that will be used for reconstruction of the images.
reconstructed_flattened_image_vector = mean_face + np.dot(weights, prod.T)
Let’s reconstruct the images
def show_reconstructed_images(pixels):
fig, axes = plt.subplots(15, 11, figsize=(20, 20), subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes.flat):
ax.imshow(np.reshape(np.array(pixels)[i], (images[0].shape[0], images[0].shape[1])) , cmap='gray')
plt.show()
show_reconstructed_images(reconstructed_flattened_image_vector)

Voila! we can reconstruct the images from the eigenfaces.
But do we need all 165 eigenfaces to reconstruct the images? Let’s see!
Dimensionality reduction
It would be interesting to see, how the variance explained changes with the number of components.

The curve starts flattening somewhere between 50 and 60. It means that after 50 or 60 components not much info is left to be captured and one can reconstruct the image using those 50 odd components instead of 165.
Let’s use K = 50 i.e. take first 50 eigenfaces and reconstruct our faces from them.
K = 50
selected_eigenVectors = prod[:, 0:K]
selected_weights = np.dot(px_images, selected_eigenVectors)
reconstructed_flattened_image_vector_k = mean_face + np.dot(selected_weights, selected_eigenVectors.T)
show_reconstructed_images(reconstructed_flattened_image_vector_k)

Not bad at all! We can more or less identify the faces from the first 50 components.
Before wrapping up, let’s just look at the progress of reconstruction with an increasing number of components.
def reconstruction_progress(k):
selected_eigenVectors = prod[:, 0:k]
selected_weights = np.dot(px_images, selected_eigenVectors)
reconstructed_flattened_image_vector_k = mean_face + np.dot(selected_weights, selected_eigenVectors.T)
return reconstructed_flattened_image_vector_k
K = [10, 20, 30, 40, 50, 60, 70, 80, 100, 120]
E = []
for k in K:
E.append(reconstruction_progress(k)[21])
def show_reconstructed_images_progress(pixels):
fig, axes = plt.subplots(2, 5, figsize=(15, 15))
for i, ax in enumerate(axes.flat):
ax.imshow(np.reshape(np.array(pixels)[i], (images[0].shape[0], images[0].shape[1])) , cmap='gray')
show_reconstructed_images_progress(E)

The reconstruction progress as we increase the components from K = 10 to 120, with 120 being the clearest of course.
The original data would have been 165 243 320 = 12,830,400 elements but if we take only 50 components then (50 77760 + 50 165 + 1 * 77760) = 3,974,010 elements will be used.
A reduction of 1/4th in the problem size. That’s a win.
What do I think?
It was interesting but a tiring day. I can see how it would be so easy to get lost in the multiplications and the transpose of the matrices. I remained stuck in covariance matrix calculation and then in the reconstruction phase for a long time.
Cons: The major drawback that I could see is the images have to be of faces only without any foreign object and the low light photos can vary the accuracy greatly. So, I can’t think of replacing the CNN with Eigenfaces at the moment.
Pros: Although matrix multiplications are expensive but with transpose, one can reduce the matrix size considerably. The approach is straightforward and much easier to implement than the hyperparameter optimisation in a neural network.
I would like to use Eigenfaces with very well defined datasets that don’t have much noise and images are centred around the primary objects nicely. There are a few papers that plan to use Eigenfaces before feeding the data to convolutional nets. I would like to invest time in such methodology and see for myself how the scalability issue pans out.
Please let me know your thoughts and comments.
The data and the code repo are present on my Github here.