SVD — A geometric viewpoint

Unpacking the visual information encoded by singular value decomposition

Artificially Intelligent
Towards Data Science

--

(Image by author) A quest for perpendicular axes that remain perpendicular after action by matrix!

SVD in textbooks

Singular value decomposition a.k.a SVD of a real matrix is often taught at the tail end of a first course in Linear algebra. Prosaically speaking, singular value decomposition of a real matrix is nothing but taking the matrix and expressing it as a product of three nicer matrices.

To be more precise but still prosaic, the actual result is as follows:

Let A be an n×m real matrix. Then A=UΣVᵀ

  • U is an n×n orthogonal¹ matrix
  • V is an m×m orthogonal matrix
  • Σ is an n×m matrix where every entry except the first r diagonal entries is zero where r is the rank of A.
  • Further the non-zero diagonal entries of Σ are precisely the so-called singular values² of A arranged in descending order.

If that .. looks a bit tame, let’s try and understand the power and utility of this result with the aid of an example.

Information encoded by a matrix

Here’s A, a very down to earth 2×2 matrix

(Image by author) A is the matrix [[6,2],[-7,6]]

It is actually representing a linear map T:with respect to the standard basis e₁=(1,0) and e₂=(0,1). The linear map T sends the usual basis elements e₁⇝(6,−7) and e₂⇝(2,6). Now, not only do we know how T acts on e₁ and e₂, we actually know how it acts on any (a,b). For instance

(Image by author) Finding what T(2,3) is

Let’s visualize how our linear transformation T acts on the standard coordinate axes. We can see that the perpendicular axes e₁, e₂ get distorted by T. The images {T(e₁),T( e₂)} are no longer perpendicular.

(Image by author) Action of T on usual axes

Finding perpendicular axes that stay perpendicular

Because it is always easier to work with perpendicular axes, we would like to find perpendicular axes of our domain (or rather unit vectors along these axes) which remain perpendicular even after applying T. Clearly {e₁, e₂} was not a good choice. Is there a good choice? If so, how can we find it?

A basis of a vector space consisting of unit vectors which are pairwise perpendicular to each other is called an orthonormal basis of the vector space. Thus our question in the more general context can be phrased as follows :

Given a linear map T:R→Rⁿ (which is represented by a n×m matrix A with respect to the standard basis), can we find an orthonormal basis {v_1,v_2,…,v_m} of Rᵐ such that {T(v_1),T(v_2),…T(v_m)} are still mutually perpendicular³ to each other ?

Enter SVD!

The singular value decomposition of A precisely answers the above question!

Let the SVD of A be A=UΣVᵀ

(Image by author) Finding axes that stay perpendicular after action of T

Recall V in the SVD of A is an m×m orthogonal matrix. Thus, the columns of V form an orthonormal basis {v_1,v_2,…,v_m} of Rᵐ.

Similarly, U is an orthogonal n×n matrix, so the columns of U form an orthonormal basis {u_1,u_2,…u_n} of R

But how are U and V related?

Right-multiplying the equation A=UΣVᵀ by V, gives

AV=(UΣVᵀ)V=UΣ(VᵀV)=UΣ

So the columns of UΣ are precisely {T(v_1),T(v_2),…T(v_m)}⊂ R

Further, since Σ is an n×m matrix with non-zero entries only along the diagonal entries (Σi,i) if at all, the columns of UΣ are just the first m columns of U scaled by the respective diagonal entries of Σ

(Image by author) How UΣ looks like

Thus T(v_i)=(Σi,i)u_i and they are mutually perpendicular to each other. Since u_i are unit vectors, the length of T(v_i) is precisely the diagonal entry Σi,i.

Summary

Given a linear map T:Rᵐ →Rⁿ (which is represented by an n×m matrix A with respect to standard basis), let A=UΣVᵀ be the singular value decomposition of A. Then the columns of V form an orthonormal basis of Rᵐ. Further {T(v_i)} still remain perpendicular to each other in Rⁿ.

The length of the vectors {T(vi)} are encoded by the diagonal entries of Σ. The unit vectors along {T(vi)} can be completed to an orthonormal basis of Rⁿ, which are precisely given by the columns of U.

So, you see the SVD of a matrix gives us a LOT of very useful information!

Uniqueness of the SVD

Since the non-zero diagonal entries of Σ are precisely the singular values of A in descending order, Σ is unique for a given matrix A.

However, the columns of V (and hence U) are not unique. For a start, the columns of V can vary up to sign (and since T(v_i) is just u_i scaled by a constant, u_i’s sign will change too). To see why U,V need not be unique and not just up to sign, think about what the singular value decomposition is for a rotation matrix A.

SVD computation code

Of course, one can (and should know how to) compute the SVD by hand. But that’s a post for another day. For now, we’ll use the SVD implementation in numpy.

import numpy as np
A = np.array([[6,2], [-7,6]])
U,s,VT = np.linalg.svd(A)
print(U)
print(s)
print(VT)

The above code gives

>>>U
array([[-0.4472136 , 0.89442719],
[ 0.89442719, 0.4472136 ]])
>>> s
array([10., 5.])
>>> VT
array([[-0.89442719, 0.4472136 ],
[ 0.4472136 , 0.89442719]])

Wrapping up the example

(Image by author) SVD of A using numpy code

You can check that 100,25 are the eigen-values of AᵀA and hence 10,5 are the singular values of A

If you actually work out the SVD by hand, you’ll see that this corresponds to

(Image by author) SVD of A by hand

Footnotes

  1. A square matrix X is orthogonal iff XᵀX = XXᵀ is the identity matrix
  2. The singular values of an n×m matrix A are the square roots of the eigenvalues of AᵀA listed with their algebraic multiplicities.
  3. By definition, the zero vector, 0, is perpendicular to every vector

--

--