Simple SVD algorithms

Naive ways to calculate SVD

Risto Hinno
Towards Data Science

--

Intro

Aim of this post is to show some simple and educational examples how to calculate singular value decomposition using simple methods. If you are interested in industry strength implementations, you might find this useful.

SVD

Singular value decomposition (SVD) is a matrix factorization method that generalizes the eigendecomposition of a square matrix (n x n) to any matrix (n x m) (source).

If you don’t know what is eigendecomposition or eigenvectors/eigenvalues, you should google it or read this post. This post assumes that you are familiar with these concepts.

SVD is similar to Principal Component Analysis (PCA), but more general. PCA assumes that input square matrix, SVD doesn’t have this assumption. General formula of SVD is:

M=UΣVᵗ, where:

  • M-is original matrix we want to decompose
  • U-is left singular matrix (columns are left singular vectors). U columns contain eigenvectors of matrix MM
  • Σ-is a diagonal matrix containing singular (eigen)values
  • V-is right singular matrix (columns are right singular vectors). V columns contain eigenvectors of matrix MM
SVD matrices (source)

SVD is more general than PCA. From the previous picture we see that SVD can handle matrices with different number of columns and rows. SVD is similar to PCA. PCA formula is M=𝑄𝚲𝑄ᵗ, which decomposes matrix into orthogonal matrix 𝑄 and diagonal matrix 𝚲. Simply this could be interpreted as:

  • change of the basis from standard basis to basis 𝑄 (using 𝑄ᵗ)
  • applying transformation matrix 𝚲 which changes length not direction as this is diagonal matrix
  • change of the basis from basis 𝑄 to standard basis (using 𝑄)

SVD does similar things, but it doesn’t return to same basis from which we started transformations. It could not do it because our original matrix M isn’t square matrix. Following picture shows change of basis and transformations related to SVD.

SVD transformations and change of basis. (source)

From the graph we see that SVD does following steps:

  • change of the basis from standard basis to basis V (using Vᵗ). Note that in graph this is shown as simple rotation
  • apply transformation described by matrix Σ. This scales our vector in basis V
  • change of the basis from V to basis U. Because our original matrix M isn’t square, matrix U can’t have same dimensions as V and we can’t return to our original standard basis (see picture “SVD matrices”)

There are numerous variants of SVD and ways to calculate SVD. I’ll show just a few of the ways to calculate it.

Ways to calculate SVD

If you want to try coding examples yourself use this notebook which has all the examples used in this post.

Power iteration

Power iteration starts with b₀ which might be a random vector. At every iteration this vector is updated using following rule:

Source

First we multiply b₀ with original matrix A (Abₖ) and divide result with the norm (||Abₖ||). We’ll continue until result has converged (updates are less than threshold).

Power method has few assumptions:

  • b has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue. Initializing b₀ randomly minimizes possibility that this assumption is not fulfilled
  • matrix A has dominant eigenvalue which has strictly greater magnitude than other eigenvalues (source).

These assumptions guarantee that algorithm converges to a reasonable result. The smaller is difference between dominant eigenvalue and second eigenvalue, the longer it might take to converge.

Very simple example of power method could be found here. I’ve made example which also finds eigenvalue. As you can see core of this function is power iteration.

For a simple example we use beer dataset (which is available from here). We’ll construct covariance matrix and try to determine dominant singular value of the dataset. Full example with data processing is available in the notebook.

#construct data 
df=pd.read_csv('data/beer_dataset.csv')
X = np.array([df['Temperatura Maxima (C)'],
df['Consumo de cerveja (litros)']]).T
C = np.cov(X, rowvar=False)
eigen_value, eigen_vec = svd_power_iteration(C)

We can plot dominant eigenvector with original data.

Dominant principle component/eigenvector of beer data

As we can see from the plot, this method really found dominant singular value/eigenvector. It looks like it is working. You can use notebook to see that results are very close to results from svd implementation provided by numpy . Next we’ll see how to get more than just first dominant singular values.

Power iteration for n singular values

To get more than just most dominant singular value from matrix, we could still use power iteration. We’ll implement new function which uses our previous svd_power_iteration function. Finding first dominant singular value is easy. We could use previously mentioned function. But how to find second singular value?

We should remove dominant direction from the matrix and repeat finding most dominant singular value (source). To do that we could subtract previous eigenvector(s) component(s) from the original matrix (using singular values and left and right singular vectors we have already calculated):

A_next = A-(singular_value₁)(u₁)(v₁)ᵗ

Here is example code (borrowed it from here, made minor modifications) for calculating multiple eigenvalues/eigenvectors.

When we apply to our beer dataset we get two eigenvalues and eigenvectors.

Beer data principle components/eigenvectors

Note that this example works also with matrices which have more columns than rows or more rows than columns. Results are comparable to numpy svd implementation. Here is one example:

mat = np.array([[1,2,3],
[4,5,6]])
u, s, v = np.linalg.svd(mat, full_matrices=False)
values, left_s, rigth_s = svd(mat)
np.allclose(np.absolute(u), np.absolute(left_s))
#True
np.allclose(np.absolute(s), np.absolute(values))
#True
np.allclose(np.absolute(v), np.absolute(rigth_s))
#True

To compare our custom solution results with numpy svd implementation we take absolute values because signs in he matrices might be opposite. It means that vectors point opposite directions but are still on the same line and thus are still eigenvectors.

Now that we have found a way to calculate multiple singular values/singular vectors, we might ask could we do it more efficiently?

Block version of the Power Method

This version has also names like simultaneous power iteration or orthogonal iteration. Idea behind this version is pretty straightforward (source):

  • other eigenvectors are orthogonal to the dominant one
  • we can use the power method, and force that the second vector is orthogonal to the first one
  • algorithm converges to two different eigenvectors
  • do this for many vectors, not just two of them

Each step we multiply A not just by just one vector, but by multiple vectors which we put in a matrix Q. At each step we’ll normalize the vectors using QR Decomposition. QR Decomposition decomposes matrix into following components:

A=QR, where

  • A-original matrix we want to decompose
  • Q-orthogonal matrix
  • R-upper triangular matrix

If algorithm converges then Q will be eigenvectors and R eigenvalues. Here is example code:

From the code we could see that calculating singular vectors and values is small part of the code. Much of the code is dedicated to dealing with different shaped matrices.

If we apply this function to beer dataset we should get similar results as we did with previous approach.

Beer data principle components/eigenvectors from svd_simultaneous_power_iteration

Eigenvectors point opposite directions compared to previous version, but they are on the same (with some small error) line and thus are the same eigenvectors. In the notebook I have examples which compares output with numpy svd implementation.

Conclusion

To calculate dominant singular value and singular vector we could start from power iteration method. This method could be adjusted for calculating n-dominant singular values and vectors. For simultaneous singular value decomposition we could use block version of Power Iteration. These methods are not fastest and most stabile methods but are great sources for learning.

References

Beer Consumption — Sao Paulo, Kaggle

Eigenvalues and Eigenvectors, Risto Hinno

How to Compute the SVD

Power Iteration, ML Wiki

Power Iteration, Wikipedia

QR decomposition, Wikipedia

Singular value decomposition, Wikipedia

Singular Value Decomposition Part 2: Theorem, Proof, Algorithm, Jeremy Kun

--

--