SVD — A geometric viewpoint
Unpacking the visual information encoded by singular value decomposition
SVD in textbooks
Singular value decomposition a.k.a SVD of a real matrix is often taught at the tail end of a first course in Linear algebra. Prosaically speaking, singular value decomposition of a real matrix is nothing but taking the matrix and expressing it as a product of three nicer matrices.
To be more precise but still prosaic, the actual result is as follows:
Let A be an n×m real matrix. Then A=UΣVᵀ
- U is an n×n orthogonal¹ matrix
- V is an m×m orthogonal matrix
- Σ is an n×m matrix where every entry except the first r diagonal entries is zero where r is the rank of A.
- Further the non-zero diagonal entries of Σ are precisely the so-called singular values² of A arranged in descending order.
If that .. looks a bit tame, let’s try and understand the power and utility of this result with the aid of an example.
Information encoded by a matrix
Here’s A, a very down to earth 2×2 matrix
It is actually representing a linear map T:R²→R² with respect to the standard basis e₁=(1,0) and e₂=(0,1). The linear map T sends the usual basis elements e₁⇝(6,−7) and e₂⇝(2,6). Now, not only do we know how T acts on e₁ and e₂, we actually know how it acts on any (a,b). For instance
Let’s visualize how our linear transformation T acts on the standard coordinate axes. We can see that the perpendicular axes e₁, e₂ get distorted by T. The images {T(e₁),T( e₂)} are no longer perpendicular.
Finding perpendicular axes that stay perpendicular
Because it is always easier to work with perpendicular axes, we would like to find perpendicular axes of our domain (or rather unit vectors along these axes) which remain perpendicular even after applying T. Clearly {e₁, e₂} was not a good choice. Is there a good choice? If so, how can we find it?
A basis of a vector space consisting of unit vectors which are pairwise perpendicular to each other is called an orthonormal basis of the vector space. Thus our question in the more general context can be phrased as follows :
Given a linear map T:Rᵐ →Rⁿ (which is represented by a n×m matrix A with respect to the standard basis), can we find an orthonormal basis {v_1,v_2,…,v_m} of Rᵐ such that {T(v_1),T(v_2),…T(v_m)} are still mutually perpendicular³ to each other ?
Enter SVD!
The singular value decomposition of A precisely answers the above question!
Let the SVD of A be A=UΣVᵀ
Recall V in the SVD of A is an m×m orthogonal matrix. Thus, the columns of V form an orthonormal basis {v_1,v_2,…,v_m} of Rᵐ.
Similarly, U is an orthogonal n×n matrix, so the columns of U form an orthonormal basis {u_1,u_2,…u_n} of Rⁿ
But how are U and V related?
Right-multiplying the equation A=UΣVᵀ by V, gives
AV=(UΣVᵀ)V=UΣ(VᵀV)=UΣ
So the columns of UΣ are precisely {T(v_1),T(v_2),…T(v_m)}⊂ Rⁿ
Further, since Σ is an n×m matrix with non-zero entries only along the diagonal entries (Σi,i) if at all, the columns of UΣ are just the first m columns of U scaled by the respective diagonal entries of Σ
Thus T(v_i)=(Σi,i)u_i and they are mutually perpendicular to each other. Since u_i are unit vectors, the length of T(v_i) is precisely the diagonal entry Σi,i.
Summary
Given a linear map T:Rᵐ →Rⁿ (which is represented by an n×m matrix A with respect to standard basis), let A=UΣVᵀ be the singular value decomposition of A. Then the columns of V form an orthonormal basis of Rᵐ. Further {T(v_i)} still remain perpendicular to each other in Rⁿ.
The length of the vectors {T(vi)} are encoded by the diagonal entries of Σ. The unit vectors along {T(vi)} can be completed to an orthonormal basis of Rⁿ, which are precisely given by the columns of U.
So, you see the SVD of a matrix gives us a LOT of very useful information!
Uniqueness of the SVD
Since the non-zero diagonal entries of Σ are precisely the singular values of A in descending order, Σ is unique for a given matrix A.
However, the columns of V (and hence U) are not unique. For a start, the columns of V can vary up to sign (and since T(v_i) is just u_i scaled by a constant, u_i’s sign will change too). To see why U,V need not be unique and not just up to sign, think about what the singular value decomposition is for a rotation matrix A.
SVD computation code
Of course, one can (and should know how to) compute the SVD by hand. But that’s a post for another day. For now, we’ll use the SVD implementation in numpy.
import numpy as np
A = np.array([[6,2], [-7,6]])
U,s,VT = np.linalg.svd(A)
print(U)
print(s)
print(VT)
The above code gives
>>>U
array([[-0.4472136 , 0.89442719],
[ 0.89442719, 0.4472136 ]])
>>> s
array([10., 5.])
>>> VT
array([[-0.89442719, 0.4472136 ],
[ 0.4472136 , 0.89442719]])
Wrapping up the example
You can check that 100,25 are the eigen-values of AᵀA and hence 10,5 are the singular values of A
If you actually work out the SVD by hand, you’ll see that this corresponds to
Footnotes
- A square matrix X is orthogonal iff XᵀX = XXᵀ is the identity matrix
- The singular values of an n×m matrix A are the square roots of the eigenvalues of AᵀA listed with their algebraic multiplicities.
- By definition, the zero vector, 0, is perpendicular to every vector