
Singular value decomposition (SVD) and eigendecomposition (ED) are both matrix factorization methods that come from linear algebra.
In the field of machine learning (ML), both can be used as data reduction methods (i.e. for Dimensionality Reduction).
Previously, we’ve discussed eigendecomposition in detail. Today, we’ll give more emphasis on discussing SVD.
Principal component analysis (PCA) can be performed using both methods. PCA is the most popular linear dimensionality reduction technique in ML. SVD is considered as the underlying mathematics behind PCA. The popular ML library, Scikit-learn also uses SVD within its PCA()
function to perform PCA. Therefore, SVD is more popular than eigendecomposition in dimensionality reduction.
NumPy provides high-level and easy-to-use functions to perform SVD and eigendecomposition.
Topics included:
----------------
01. What is singular value decomposition?
02. SVD equation and its terms
03. Singular value decomposition in NumPy - svd() function
04. What is eigendecomposition?
05. Eigendecomposition equation and its terms
06. Eigendecomposition in NumPy - eig() function
07. Performing PCA using singular value decomposition
08. Performing PCA using eigendecomposition
09. Compare the results of both methods
10. Conclusions
What is singular value decomposition?
Singular value decomposition (SVD) is a type of matrix factorization method. It is an important mathematical operation that comes from Linear Algebra.
There are multiple ways to factorize (decompose / break down) a matrix like we can factorize the number 16, for example, into 2 x 8 = 16, 4 x 4 = 16, 2 x 2 x 4 = 16, 2 x 2 x 2 x 2 = 16. Not all factorization methods are equally important. It depends on the use case.
Likewise, we can factorize a matrix in multiple ways of which some are more important. Singular value decomposition (SVD) and eigendecomposition are such important methods of factorizing a matrix.
Singular value decomposition is the process of decomposing matrix A into the product of three matrices as in the following equation.

- A: The matrix on which we perform SVD
- U: A square matrix. This is called the right singular vectors matrix.
- Σ: A diagonal matrix. This is called the singular value matrix which is the same size as A.
- V^T: A square matrix. This is called the left singular vectors matrix. By default, NumPy’s SVD function returns V^T which is the transform of V.
The matrix A can be square or non-square as SVD is defined for both square and non-square matrices. In contrast, eigendecomposition is defined only for square matrices.
The matrix Σ contains singular values which are always non-negative values. Zero values can be included.
The number of non-zero singular values equals the rank of matrix A.
Singular value decomposition in NumPy
In NumPy, SVD can be easily performed using the svd()
function. Here is an example.
import numpy as np
A = np.array([[2, 4, 1],
[5, 7, 6],
[1, 1, 3]])
U, s, Vt = np.linalg.svd(A)
print("A")
print(A)
print("nU")
print(U)
print("ns")
print(s)
print("nVt")
print(Vt)

NumPy’s svd() function returns Σ (singular value matrix) as a vector (denoted by s), not a diagonal matrix. That vector contains all the singular values of A.
If you want to get the Σ as it is, you can do some modifications using the following code.
S = np.zeros(np.shape(A))
np.fill_diagonal(S,s)
print(S)

If you only need to compute singular values and don’t need U and Vt matrices, you can run the following code.
s = np.linalg.svd(A, compute_uv=False)
print(s)

That’s how we can perform SVD in NumPy. It is much easier than you think. Next, we will move into the eigendecomposition part.
What is eigendecomposition?
Eigendecompostionn is another important matrix factorizing method.
Eigendecomposition is the process of decomposing a square matrix A into the product of eigenvalues and eigenvectors as in the following equation.

- A: The matrix on which we perform eigendecomposition. It should be a square matrix.
- λ: A scalar called the eigenvalue.
- x: A vector called the eigenvector.
The matrix A should be a square matrix as eigendecomposition is defined only for square matrices.
The eigenvalues can be positive or negative.
The eigenvalues and eigenvectors come in pairs. Such a pair is known as an eigenpair. So, matrix A can have multiple such eigenpairs. The above equation shows the relationship between A and one of its eigenpairs [ref: Eigendecomposition of a Covariance Matrix with NumPy]
Eigendecomposition in NumPy
In NumPy, eigendecomposition can be easily performed using the eig()
function. Here is an example.
import numpy as np
A = np.array([[2, 4, 1],
[5, 7, 6],
[1, 1, 3]])
eigen_vals, eigen_vecs = np.linalg.eig(A)
print("A")
print(A)
print("nEigenvalues")
print(eigen_vals)
print("nEigenvectors")
print(eigen_vecs)

NumPy’s eig() function returns eigenvalues as a vector. That vector contains all the eigenvalues of A.
We’ve performed both SVD and eigendecomposition on the same matrix, A. By looking at the outputs, we can say that:
Singular value decomposition and the eigendecomposition are not the same things even if the matrix is square.
Performing PCA using singular value decomposition
PCA is often performed by applying SVD to the covariance matrix of standardized data. The covariance matrix of standardized data is exactly the same as the correlation matrix of non-standardized data.
We standardize data before SVD because singular values are highly sensitive to the relative ranges of original features.
To demonstrate the PCA process using SVD, we’ll use the Wine dataset which has 13 input features.
Step 1: Getting Wine data.
from sklearn.datasets import load_wine
wine = load_wine()
X = wine.data
y = wine.target
print("Wine dataset size:", X.shape)

Step 2: Standardizing data.
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
Step 3: Computing the covariance matrix of standardized data.
import numpy as np
cov_mat = np.cov(X_scaled.T)
Step 4: Performing SVD on the covariance matrix and getting the singular values of the covariance matrix.
U, s, Vt = np.linalg.svd(cov_mat)
print(s)

The sum of these singular values is equal to the total amount of variance in the data. Each singular value represents the amount of variance captured by each component. To calculate these things, we need to convert singular values to the variance explained.
Step 5: Converting singular values to the variance explained.
exp_var = (s / np.sum(s)) * 100
print(exp_var)

The first component captures a 36.2% variance in the data. The second component captures a 19.2% variance in the data, and so on.
Step 6: Visualizing singular values to select the right number of components
Not all components contribute the same to the model. We can drop the components that do not capture much variance in the data and keep only the most important components. For that, we need to visualize all the singular values by creating the cumulative explained variance plot.
cum_exp_var = np.cumsum(exp_var)
# a = Number of input features + 1
a = X.shape[1] + 1
import matplotlib.pyplot as plt
plt.bar(range(1, a), exp_var, align='center',
label='Individual explained variance')
plt.step(range(1, a), cum_exp_var, where='mid',
label='Cumulative explained variance', color='red')
plt.ylabel('Explained variance percentage')
plt.xlabel('Principal component index')
plt.xticks(ticks=list(range(1, a)))
plt.legend(loc='best')
plt.tight_layout()
plt.savefig("cumulative explained variance plot.png")

So, it is very clear that the first 7 components capture about 90% variance in the data. So, we can select the first 7 components for the Wine dataset.
See all selection criteria here.
Performing PCA using eigendecomposition
PCA can also be performed by applying eigendecomposition to the covariance matrix of standardized data.
The first 3 steps are the same as before. So, I will continue with the fourth step.
Step 1, Step 2, Step 3: Same as before.
Step 4: Performing eigendecomposition on the covariance matrix and getting the eigenvalues of the covariance matrix.
eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)
print(eigen_vals)

The eigenvalues are exactly the same as the singular values. The reason is that the covariance matrix is symmetric.

In general, we can say that:
For a symmetric matrix, eigenvalues are exactly the same as the singular values.
In other words,
For a symmetric matrix, singular value decomposition and the eigendecomposition are the same things.
Unlike the svd() function, the eigenvalues returned by the eig() function are not in descending order. So, we need to manually sort them from largest to smallest.
# Sort the eigenvalues in descending order
eigen_vals = np.sort(eigen_vals)[::-1]
print(eigen_vals)

Step 5, Step 6: Same as before.
We can visualize the eigenvalues as the same as before. You will get exactly the same plot.
Conclusions
By default, PCA is performed by using SVD. It can also be performed by using eigendecomposition. Both approaches give the same results because the covariance matrix is symmetric.
In general, singular value decomposition and eigendecomposition are completely two different things, but for a symmetric matrix like the covariance matrix used in PCA, both are the same!
This is the end of today’s article.
Please let me know if you’ve any questions or feedback.
Read next (Recommended)
How about an AI course?
Support me as a writer
I hope you enjoyed reading this article. If you’d like to support me as a writer, kindly consider signing up for a membership to get unlimited access to Medium. It only costs $5 per month and I will receive a portion of your membership fee.
Join my private list of emails
Never miss a great story from me again. By subscribing to my email list, you will directly receive my stories as soon as I publish them.
Thank you so much for your continuous support! See you in the next article. Happy learning to everyone!
Wine dataset info
- Dataset source: You can download the original dataset here.
- Dataset license: This dataset is available under the CC BY 4.0 (Creative Commons Attribution 4.0) license.
- Citation: Lichman, M. (2013). UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Rukshan Pramoditha 2023–03–20