Symmetric matrices are matrices that are symmetric along the diagonal, which means Aᵀ = A – the transpose of the matrix equals itself. It is an operator with the self-adjoint property (it is indeed a big deal to think about a matrix as an operator and study its property). Though we can’t directly read off the geometric properties from the symmetry, we can find the most intuitive explanation of symmetric matrix in its eigenvectors, which will give us a deeper understanding of the symmetric matrices.

A trivial example is the identity matrix. A non-trivial example can be something like:

However, though the definition is simple as this, they have a lot of nice probability and mean a lot of things. In this post, we will have a look at the important properties, explain them intuitively, and introduce the applications.
The Hermitian matrix is a complex extension of the symmetric matrix, which means in a Hermitian matrix, all the entries satisfy

The symmetric matrices are simply the hermitian matrices with the conjugate transpose being the same as themselves. Therefore, it has all the properties which a symmetric matrix has.

Mostly in this post, we will be talking about the real case, the symmetric matrices, to make things a bit simpler, also in data science we mostly encounter matrices with real entries since we are dealing with real-world problems.
The most important properties of symmetric matrices
Three properties of symmetric matrices are introduced in this section. They are considered to be the most important because they concern the behavior of eigenvalues and eigenvectors of those matrices, which is the fundamental characteristic, which distinguishes symmetric matrices from non-symmetric ones.
Property 1. Symmetric matrices have real eigenvalues.
This can be proved easily algebraically (a formal, direct proof, as opposed to induction, contradiction, etc.). Firstly, a quick capture of eigenvalues and eigenvectors. Eigenvectors of matrix A are the vectors, whose directions don’t change, after A being applied to it. The direction is not changed, but the vectors can be scaled. This shows the non-triviality of this property – real eigenvalues give us information of stretching or scaling in the linear transformation, unlike complex eigenvalues, which don’t have "size".
The proportions that the vectors are scaled are eigenvalues, we denote them by λ. Therefore we have the relation Ax = λx (Eq 1.1). The proof is fairly easy, but there are some important knowledge of linear algebra, so we will still go through it step by step
Start from here, we Equation 1.1 by the conjugate transpose of x, xᴴ and we arrive at

How do we move on? An important thing to notice is that λ is a scalar, which means the multiplication involving λ is commutative. Therefore, we can move it to the left of xᴴ

xᴴx is a Euclidean norm (or 2-norm), which is defined as

in a two-dimensional Euclidean space, it’s the length of a vector with coordinates (x₁, …, xₙ). We can then write Eq 1.3 as

Since the conjugate transpose (operator H) works just like ordinary transpose (operator T), we can use the property that xᴴA = (Ax)ᴴ.

What does (Ax)ᴴ equal to? Here we will use the relation Ax = λx again, but this time (Ax)ᴴ will leave us the complex conjugate of λ, we denote it λ with a bar.

We have seen xᴴx before in Eq 1.3, after substituting in the 2-norm (we can do this because both λ_bar and xᴴx are scalars), we get

This leads to λ and its complex conjugate being equal

Only under one circumstance is Equation 1.9 valid, which is λ being real. This way we have just finished the proof.
Property 2. The eigenvectors corresponding to the eigenvalues are orthogonal.
The proof is also a direct formal proof, but it’s simple. Firstly we need to recognize our target, which is the following

Consider a symmetric matrix A, where x₁ and x₂ are eigenvectors of A corresponding to different eigenvectors (the reason why we need this condition will be explained a bit later). By the definition of eigenvalues and symmetric matrices, we can get the following equations

Now we need to approach Equation 1.10. Let’s try to put x₁ and x₂ together – we multiply (Ax₁)ᵀ by x₁ᵀ on the left side:

In Eq 1.13 apart from the property of symmetric matrix, two other facts are used: (1) the matrix multiplication is associative (vectors are n by 1 matrix)(2) matrix-scalar multiplication is commutative – we can move the scalar freely. Then since dot production is commutative, which means x₁ᵀx₂ and x₂ᵀx₁ are the same things, we have

where the x₁∙x₂ denotes the dot product. If λ₁ ≠ λ₂, it must be the case that x₁∙x₂ = 0, which means those two eigenvectors are orthogonal. If λ₁ = λ₂, there are two different eigenvectors corresponding to the same eigenvalue (this can happen, think about the rival example – the identity matrix). Since the eigenvectors are in the null space of (A-λI) (denoted as N(A-λI)), when one eigenvector corresponds to multiple eigenvectors, N(A-λI) has a dimension larger than one. In this case, we have infinite many choices for those eigenvectors and we can always choose them to be orthogonal.
Apparently, there are cases where a real matrix has complex eigenvalues. This happens with rotation matrice. Why is it so? Let Q be a rotation matrix. We know that the eigenvectors don’t change direction after being applied to by Q. But if Q is a rotation matrix, how can x not change direction, if x is a non-zero vector? The conclusion is that the eigenvector must be complex.
A rotation matrix R(θ) in the two-dimensional space is shown as follows.

R(θ) rotates a vector counterclockwise by an angle θ. It is a real matrix with complex eigenvalues and eigenvectors.
Property 3. Symmetric matrices are always diagonalizable. (The spectral theorem).
This is also related to the other two properties of symmetric matrices. The name of this theorem might be confusing. In fact, the set of all the eigenvalues of a matrix is called a spectrum. Also, we can think about it like this:
the eigenvalue-eigenvector pairs tell us in which direction is a vector distorted after the given linear transformation.
this is shown in the following figure, after transformation, in the direction of v₁, the figure is stretched a lot, but in the direction of v₂ not very much.

A matrix that is diagonalizable means there exists a diagonal matrix D (all the entries outside of the diagonal are zeros) such that P⁻¹AP = D, where P is an invertible matrix. We can also say that a matrix is diagonalizable if the matrix can be written in the form A = PDP⁻¹.
The decomposition is not unique generally, but unique up to the permutation of the entries on the diagonal in D and the scalar multiplication of the eigenvectors in P. Also we need to note that the diagonalization is equivalent to finding eigenvectors and eigenvalues, no matter the matrix is symmetric or not. However, for a non-symmetric matrix, D doesn’t have to be an orthogonal matrix,
Those two definitions are equivalent but can be interpreted differently (this decomposition makes raising matrix to power very handy). The second one, A = PDP⁻¹, tells us how can A be decomposed, in the meanwhile, the first one, P⁻¹AP = D, is the one that tells us A can be diagonalized. It tells us that it is possible to align the standard basis (given by the identity matrix) with the eigenvectors. This is enabled by the orthogonality of the eigenvectors, which is shown in property two.
This "align the standard basis with the eigenvectors" sounds very abstract. We need to think about this: what does matrix transformation do to the basis? A matrix consisting of basis α = {v₁, …, vₙ} (with those vectors in the column) transforms a vector x from the standard basis to the coordinate system formed by basis α, we denote this matrix by Aα. Therefore, in the process of the diagonalization P⁻¹AP = D, P sends a vector from the standard basis to the eigenvectors, A scales it, and then P⁻¹ sends the vector back to the standard basis. From the perspective of the vector, the coordinate system is aligned with the standard basis with the eigenvectors.

This alignment is shown in figure 1.16, the matrix used in this example is

where V is the matrix with eigenvectors of length one as the columns, each of them corresponds to the eigenvalues in the diagonal matrix. As for the calculation, we can let eig
in Matlab do the work.
This property follows the spectral theorem directly, which says
If A is Hermitian, there exists an orthonormal basis of V consisting of eigenvectors of A. Each eigenvector is real.
The theorem directly points out a way to diagonalize a symmetric matrix. To prove the property directly, we can use induction on the size (dimension) of the matrix. A detailed proof can be found here.
The very basic idea of the proof: The base case, where A is a one by one matrix, is trivial. Assume that the n-1 by n-1 matrix is diagonalizable (has n-1 independent eigenvectors), we can find another eigenvector in n-dimensional space which is orthogonal to those n-1 dimensional eigenvectors. Thus the n by n matrix is also diagonalizable.
Definiteness of matrices
When are those properties useful? Even before the formal study of matrices, they have been used for solving systems of linear equations for a long time. Think about the matrices as operators, the information of the linear equations is stored in those operators – matrices can be used to study the behavior of functions.
More than symmetry, an even nicer property matrix can have is positive-definiteness. If a symmetric (or Hermitian) matrix is positive-definite, all of its eigenvalues are positive. If all of its eigenvalues are non-negative, then it is a semi-definite matrix. For a matrix to be positive-definite, it’s obvious to require it to be symmetric because of property 1, since it only makes sense to ask whether a number is positive or negative or how large it is, when it is real, as mentioned before.
Eigenvalues, eigenvectors, and function behavior
A good application of this is the Hessian matrix, we will use this as an example to demonstrate using matrix for analyzing function behavior. When we try to find a local extreme, it’s always good news to find out that the Hessian is positive definite. The Hessian is a matrix consisting of the second partial derivatives of a real function. Formally, let f: ℝⁿ ➝ℝ be a function, the Hessian is defined as

and we call H(x) the Hessian of f, which is an n by n Matrix. in Def 1.18, the Hessian is written very compactly, it is the same thing as

How does this affect function behavior? We will look at one super simple example. Consider the function f(x, y) = y²-x². The Hessian is computed as follows

It can also be computed using the function hessian
in Matlab. Since it’s a diagonal matrix and the trace (sum of the entries on the diagonal) equals the sum of eigenvectors, we can immediately see that one of the eigenvalues is 2 and another one is -2. They correspond to the eigenvectors v₁ = [1, 0]ᵀ and v₂ = [0, 1]ᵀ. This matrix is symmetric but not positive definite. Therefore, there’s no local extreme on the whole ℝ², we can only find a saddle point on point x=0, y=0. This means in the direction of v₁, where the eigenvalue is positive, the function increases, but in the direction of v₂, where the eigenvalue is negative, the function decreases. The image of the function is shown below

The image is generated using Matlab with the following code
clc; clear; close all
xx = -5: 0.2: 5;
yy = -5: 0.2: 5;
[x, y] = meshgrid(xx, yy);
zz = x.^2 - y.^2;
figure
mesh(x, y, zz)
view(-10, 10)
Now we flip the sign and change the function into f(x, y) = x² + y². The eigenvectors remain the same, but all the eigenvectors become positive. This means, in both the direction of v₁ and the direction of v₂, the function grows. Therefore, the local minimum can be found -at x=0, y=0, f(x,y) = 0. This is also the global minimum. The graph looks like

Summary
Matrices have wide usage in numerous fields. When dealing with matrices, the concepts positive-definiteness, eigenvectors, eigenvalues, symmetric matrices are often encountered. In this post, the three most important properties of symmetric (Hermitian) matrices are introduced, which are related to the eigenvectors and eigenvalues of the matrices. Those properties are explained geometrically, but some algebraic proof is also included. At last, one example of using matrices to analyze function behavior is introduced.
Resources:
[1] Weisstein, Eric W. "Hermitian Matrix." From MathWorld – A Wolfram Web Resource.
[2] Symmetric matrices, access 19 September 2021
[3] Horn, R. A., & Johnson, C. R. (2012). _Matrix analysis_. Cambridge university press.