The world’s leading publication for data science, AI, and ML professionals.

Non-Negative Matrix Factorization (NMF) for Dimensionality Reduction in Image Data

Discussing theory and implementation with Python and Scikit-learn

Original image by an_photos from Pixabay (Slightly edited by author)
Original image by an_photos from Pixabay (Slightly edited by author)

I have already discussed different types of Dimensionality Reduction techniques in detail.

Principal Component Analysis (PCA), Factor Analysis (FA), Linear Discriminant Analysis (LDA), Autoencoders (AEs), and Kernel PCA are the most popular ones.

Non-Negative Matrix Factorization (NMF or NNMF) is also a linear dimensionality reduction technique that can be used to reduce the dimensionality of the feature matrix.

All dimensionality reduction techniques fall under the category of unsupervised machine learning in which we can reveal hidden patterns and important relationships in the data without requiring labels.

So, dimensionality reduction algorithms deal with unlabeled data. When training such an algorithm, the fit() method only needs the feature matrix, X as the input and it does not require the label column, y.

As its name implies, non-negative matrix factorization (NMF) needs the feature matrix to be non-negative.

Because of this non-negativity constraint, the usage of NMF is limited to data with non-negative values such as image data (pixel values always lie between 0 and 255, hence there are no negative values in image data!).

What you will learn:
----------------------------------------------------
1. Maths behind NMF
2. NMF equation
3. Feature matrix, V
4. Transformed data matrix, W
5. Factorization matrix, H
6. Scikit-learn NMF() class
7. Arguments, methods and attributes of NMF() class
8. Load the MNIST with Scikit-learn
9. Perform dimensionality reduction in image data

Other matrix decomposition methods:
----------------------------------------------------
1. Eigendecomposition
2. Singular value decomposition

The maths behind non-negative matrix factorization (NMF)

Non-negative matrix factorization comes from Linear Algebra. In simple words, it is the process of decomposing a matrix into the product of two small matrices.

To be more precise,

Non-negative matrix factorization (NMF) is the process of decomposing a non-negative feature matrix, V (nxp) into a product of two non-negative matrices called W (nxd) and H (dxp). All three matrices should contain non-negative elements.

Non-negative matrix factorization equation (Image by author)
Non-negative matrix factorization equation (Image by author)

The product of W and H matrices only gives an approximation to the matrix V. So, you should expect some information loss when applying NMF.

  • V (n x p): Represents the feature matrix where n is the number of observations (samples) and p is the number of features (variables). This is the data matrix we decompose.
  • W (n x d): Represents the transformed data matrix after applying NMF. We can use this transformed matrix in place of the original feature matrix, V. So, W is the most important output of NMF. It is obtained by calling Scikit-learn NMF’s _fit_transform() method. n is the number of observations (samples) and d is the number of latent factors or components. In other words, d describes the amount of dimensionality that we want to keep. It is actually a hyperparameter that we need to specify in the Scikit-learn NMF’s n_components argument. This is an integer value that should be less than the number of features, p and greater than 0. Selecting the right value for d_ is a real challenge when performing NMF. We need to consider the balance between the amount of information and the number of components that we want to keep.
from sklearn.decomposition import NMF

# W = transformed data matrix, V = original feature matrix
W = NMF(n_components=d).fit_transform(V)
  • H (d x p): Represents the factorization matrix. d and p have the same definitions above. This matrix is not very important. However, this can be obtained by calling the Scikit-learn NMF’s _components__ attribute.
from sklearn.decomposition import NMF

# H = factorization matrix
H = NMF(n_components=d).fit(V).components_

Python implementation of non-negative matrix factorization (NMF)

In Python, NMF is implemented by using Scikit-learn’s NMF() class. As you already know, Scikit-learn is the Python Machine Learning library.

All you need to do is to import the NMF() class and create an instance of it by specifying the required arguments.

# Import
from sklearn.decomposition import NMF

# Create an instance
nmf_model = NMF(n_components, init, random_state)

Important arguments of NMF() class

  • n_components: An integer that defines the number of components or latent factors or the amount of dimensionality that we want to keep. The most important hyperparameter! The value is less than the number of original features and greater than 0.
  • init: A method of the initialization process. The output returned by the NMF model will significantly vary depending on the init method you choose.
  • random_state: Used when the initialization method is **** ‘nndsvdar‘ or ‘random‘. Use an integer to get the same results across different executions.

Note: There are many arguments in the NMF() class. If we do not specify them, they take their default values when calling the NMF() function. To learn more about those arguments, refer to the Scikit-learn documentation.

Important methods of NMF() class

  • fit(V): Learns an NMF model from the feature matrix, V. No transformation is applied here.
  • fit_transform(V): Learns an NMF model from the feature matrix, V and returns the transformed data matrix, W.
W = nmf_model.fit_transform(V)
  • transform(V): Returns the transformed data matrix, W after fitting the model.
nmf_model.fit(V) # Fitted model
W = nmf_model.transform(V)
  • inverse_transform(W): Transforms (recovers) the data matrix, W back to the original space. Very useful for visualization purposes!
recovered_data = nmf_model.inverse_transform(W)

Important attributes of NMF() class

  • components_: Returns the factorization matrix, H. This matrix is not very important.
H = nmf_model.components_
  • reconstructionerr: Returns the beta divergence as a float that measures the distance between V and the product of WH. The solver tries to minimize this error during the training process. Analyzing this error by setting different values for n_components is a great way of choosing the right number of components, d.

Reducing dimensionality in image data using non-negative matrix factorization (NMF)

We’ll use the MNIST digits dataset for this task. We’ll perform NMF on MNIST data to reduce dimensionality by choosing different numbers of components and then we compare each output with the original one.

Step 1: Load the MNIST dataset using Scikit-learn

The MNIST digits dataset can be loaded as follows using Scikit-learn.

from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1)
image_data = mnist['data']

print("Shape:", image_data.shape)
print("Type:", type(image_data))
(Image by author)
(Image by author)

The dataset is loaded as a Pandas data frame. The shape is (70000, 784). There are 70000 observations (images) in the dataset. Each observation has 784 features (pixel values). The size of an image is 28 x 28. When loading the MNIST dataset in this way, each image is represented as a 1D array that contains 784 (28 x 28) elements. This is the format we need for this task and no further modification is required for the dataset.

Alternatively, you can load the MNIST dataset using Keras. There, you’ll get a 28 x 28 2D array for each image instead of a 1D array. You can learn more here.

Step 2: Visualize a sample of the original images

Now, we’ll visualize a sample of the first five images in the MNIST dataset. This sample can be used to compare with the outputs of the NMF model.

import matplotlib.pyplot as plt

n = 5
plt.figure(figsize=(6.75, 1.5))
for i in range(n):
  ax = plt.subplot(1, n, i+1)
  plt.imshow(image_data.iloc[i].values.reshape(28, 28), cmap="binary")
  ax.axis('off')

plt.show()
A sample of the original MNIST digits (Image by author)
A sample of the original MNIST digits (Image by author)

Step 3: Apply NMF with 9 components (d = 9)

from sklearn.decomposition import NMF

nmf_model = NMF(n_components=9, init='random', random_state=0)
image_data_nmf = nmf_model.fit_transform(image_data)

print("Shape:", image_data_nmf.shape)
print("Type:", type(image_data_nmf))
(Image by author)
(Image by author)

Now, the new dimensionality is 9. The original dimensionality was 784. Therefore, the dimensionality has been significantly reduced!

To get the shape of V, W and H matrices, we can run the following code.

print("V_shape:", image_data.shape)
print("W_shape:", image_data_nmf.shape)
print("H_shape", nmf_model.components_.shape)
(Image by author)
(Image by author)

To get the reconstruction error or beta divergence between V and the product of WH, we can run the following code.

nmf_model.reconstruction_err_
(Image by author)
(Image by author)

The reconstruction error is very high. This is because we have selected only 9 components out of 784. We can verify this by visualizing the output.

image_data_nmf_recovered = nmf_model.inverse_transform(image_data_nmf)

n = 5
plt.figure(figsize=(6.75, 1.5))
for i in range(n):
  ax = plt.subplot(1, n, i+1)
  plt.imshow(image_data_nmf_recovered[i, :].reshape(28, 28), cmap="binary")
  ax.axis('off')

plt.show()

NMF output: 9 components or d = 9

NMF output: 9 components or d = 9 (Image by author)
NMF output: 9 components or d = 9 (Image by author)

The digits are not clear. You can compare this output with the sample of the original images.

I have run the NMF algorithm by choosing 100, 225 and 784 components. Here are the results.

NMF output: 100 components or d = 100

NMF output: 100 components or d = 100 (Image by author)
NMF output: 100 components or d = 100 (Image by author)

The reconstruction error is 174524.20.

NMF output: 225 components or d = 225

NMF output: 225 components or d = 225 (Image by author)
NMF output: 225 components or d = 225 (Image by author)

The reconstruction error is 104024.62.

NMF output: 784 components or d = 784 (all components)

NMF output: 784 components or d = 784 (Image by author)
NMF output: 784 components or d = 784 (Image by author)

The reconstruction error is 23349.67.

Conclusions

When the number of components is increased when running non-negative matrix factorization (NMF), the images are getting clear and the reconstruction error is getting low.

By looking at the output and reconstruction error, a right value for d can be chosen. For this, you need to run the NMF algorithm a few times which may be time-consuming depending on the computer resources you have.

With d = 784 (all components), you will still get a reconstruction error of 23349.67 instead of zero.

It is obvious that the product of W and H matrices only gives a non-negative matrix approximation to the feature matrix, V.

Can we run NMF with a negative matrix?

The answer is no. If you try to use NMF with a feature matrix that has negative values, you will get the following ValueError!

import numpy as np
from sklearn.decomposition import NMF

V = np.array([[1, 1, -2, 1], [2, 1, -3, 2], [3, 1.2, -3.3, 5]])

nmf_model = NMF(n_components=2, init='random', random_state=0)
W = nmf_model.fit_transform(V)

print("V_shape:", V.shape)
print("W_shape:", W.shape)
print("Reconstruction error:", nmf_model.reconstruction_err_)
ValueError! (Image by author)
ValueError! (Image by author)

You can’t break the non-negativity constraint when running non-negative matrix factorization (NMF). The feature matrix should always contain non-negative elements.


This is the end of today’s article.

Please let me know if you’ve any questions or feedback.

Other matrix decomposition methods you might be interested in

Read next (Recommended)

How about an AI course?

Join my private list of emails

Never miss a great story from me again. By subscribing to my email list, you will directly receive my stories as soon as I publish them.

Thank you so much for your continuous support! See you in the next article. Happy learning to everyone!


MNIST dataset info

  • Citation: Deng, L., 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), pp. 141–142.
  • Source: http://yann.lecun.com/exdb/mnist/
  • License: Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of the MNIST dataset which is available under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA). You can learn more about different dataset license types here.

Designed and written by: Rukshan Pramoditha

2023–05–06


Related Articles