
Intro
I have learned about eigenvalues and eigenvectors in University in a linear algebra course. It was very dry and mathematical, so I did not get, what it is all about. But I want to present this topic to you in a more intuitive way and I will use many animations to illustrate it.
First, we will look at how applying a matrix to a vector rotates and scales a vector. This will show us what eigenvalues and eigenvectors are. Then we will learn about principal components and that they are the eigenvectors of the covariance matrix. This knowledge will help us understand our final topic, principal component analysis.
Matrix multiplication
To understand eigenvalues and eigenvectors, we have to first take a look at matrix multiplication.
Let’s consider the following matrix.

When we take the dot product of a matrix and a vector, the resulting vector is a rotated and scaled version of the original one.


In Data Science, we mostly talk of data points, not vectors. But they are the same essentially and can be used interchangeably. And data points can also be transformed by matrix multiplication in the same way as vectors.

But even if matrix multiplication rotates and scales, it is a linear transformation.
Why is matrix multiplication a linear transformation? Consider a bunch of data points (denoted in red). Imagine a grid on which these points are located. When we apply the matrix to our data points and move the grid along with the data points, we see that the lines of the grid remain straight. If the lines would curve, then the transformation would be non-linear.

Eigenvectors
We consider the same matrix as above.

When applying this matrix to different vectors, they behave differently. Some of them may get rotated and scaled. Some of them only rotated, some of them only scaled and some of them may not change at all.
Eigenvectors are the vectors, which
- only get scaled.
- or ** do not change at all**.

You can see, that the eigenvectors stay on the same line and other vectors(generic vectors) get rotated by some degree.

A 2×2 matrix has always two eigenvectors, but there are not always orthogonal to each other.
Eigenvalues
Each Eigenvector has a corresponding eigenvalue. It is the factor by which the eigenvector gets scaled, when it gets transformed by the matrix. We consider the same matrix and therefore the same two eigenvectors as mentioned above.

One of the two eigenvectors of this matrix (I call it Eigenvector 1, but this is arbitrary) is scaled by a factor of 1.4.

Eigenvector 2 get’s also scaled by a factor of 1.4 but it’s direction get’s inverted. Therefore, eigenvalue 2 is -1.4.

Applications of eigenvectors and eigenvalues values in Data Science
Principal components
Using eigenvalues and eigenvectors, we can find the main axes of our data. The first main axis (also called "first principal component") is the axis in which the data varies the most. The second main axis (also called "second principal component") is the axis with the second largest variation and so on.
Let’s consider a 2-dimensional dataset.

To find the principal components, we first calculate the Variance-Covariance matrix C.

We can use numpy to calculate them. Note that our data (X) must be ordered like a pandas data frame. Each column represents a different variable/feature.
import numpy as np
C = np.cov(X, rowvar = False)
And then we can calculate the eigenvectors and eigenvalues of C.
import numpy as np
eigenvalues,eigenvectors = np.linalg.eig(C)
The eigenvectors show us the direction of our main axes (principal components) of our data. The greater the eigenvalue, the greater the variation along this axis. So the eigenvector with the largest eigenvalue corresponds to the axis with the most variance.

We should remember, that matrices represent a linear transformation. When we multiply the Covariance matrix with our data, we can see that the center of the data does not change. And the data gets stretched in the direction of the eigenvector with the bigger variance/eigenvalue and squeezed along the axis of the eigenvector with the smaller variance.
Data points lying directly on the eigenvectors do not get rotated.

Principal component analysis (PCA)
Principal component analysis uses the power of eigenvectors and eigenvalues to reduce the number of features in our data, while keeping most of the variance (and therefore most of the information). In PCA we specify the number of components we want to keep beforehand.
The PCA algorithm consists of the following steps.
- Standardizing data by subtracting the mean and dividing by the standard deviation
- Calculate the Covariance matrix.
- Calculate eigenvalues and eigenvectors
- Merge the eigenvectors into a matrix and apply it to the data. This rotates and scales the data. The principal components are now aligned with the axes of our features.
- Keep the new features which account for the most variation and discard the rest.
Let’s look at what PCA does on a 2-dimensional dataset. In this example, we do not reduce the number of features. Reducing the number of features makes sense for high dimensional data because then it reduces the number of features.

Let’s load the iris dataset. It contains measurements of three different species of iris flowers. Those species are iris-virginica, iris-versicolor and iris-setosa.
Let’s take a quick glimpse at the dataset.
print(iris_df.head())

We can create a so called "scree plot" to look at which variables account for the most variability in the data. For this, we perform a first PCA.

As we can see, the first two components account for most of the variability in the data. Therefore I have decided to keep only the first two components and discard the Rest. When having determined the number of components to keep, we can run a second PCA in which we reduce the number of features.
We take a look at our data, which is an array now.
print(iris_transformed[:5,:])

We can see, that we have only two columns left. These columns/variables are a linear combination of our original data and do not correspond to a feature of the original dataset ( like sepal width, sepal length, …).
Let’s visualize our data.
We see our new combined features on the x and y axes. The plant species is indicated by the color of a data point.

We can see, that much of the information in the data has been preserved and we could now train an ML model, that classifies the data points according to the three species.
Conclusion
I will now summarize the most important concepts.
Matrix multiplication
When we multiply a matrix with a vector, the vector get’s transformed linearly. This linear transformation is a mixture of rotating and scaling the vector. The vectors, which get only scaled and not rotated are called eigenvectors. The factor by which they get scaled is the corresponding eigenvalue.
Principal components
Principal components are the axes in which our data shows the most variation. The first principal component explains the biggest part of the observed variation and the second principal component the second largest part and so on. The Principal components are the eigenvectors of the covariance matrix. The first principal component corresponds to the eigenvector with the largest eigenvalue.
PCA
Principal component analysis is a technique to reduce the number of features in our dataset. It consists of the following processing steps.
- Standardizing data by subtracting the mean and dividing by the standard deviation
- Calculate the covariance matrix.
- Calculate eigenvalues and eigenvectors
- Merge the eigenvectors into a matrix and apply it to the data. This rotates and scales the data. The principal components are now aligned with the axes of our features.
- Keep as many new features as we specified and discard the rest. We keep the features which can explain the most variation in the data.
Sources
https://datascienceplus.com/understanding-the-covariance-matrix/
https://en.wikipedia.org/wiki/Iris_flower_data_set
https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html
Datasets
Iris
The Iris dataset and license can be found under:
It is licensed under creative commons which means you can copy, modify, distribute and perform the work, even for commercial purposes, all without asking for permission.
Related articles
Other articles by author
Want to connect?
Linkedin https://www.linkedin.com/in/vincent-m%C3%BCller-6b3542214/ Facebook https://www.facebook.com/profile.php?id=100072095823739 Twitter https://twitter.com/Vincent02770108 Medium https://medium.com/@Vincent.Mueller You can become a Medium member and support me at the same time https://medium.com/@Vincent.Mueller/membership
Math appendix
Calculating eigenvalues and eigenvectors
If you like Mathematics and want to dive deeper, I have summarized some of the math used in this blog post.
We can easily calculate the eigenvectors and eigenvalues in python.
import numpy as np
eigenvalues,eigenvectors = np.linalg.eig(M)
If we want to calculate them by hand, it gets a little bit more complicated.
As we have seen, when we multiply the matrix M with an eigenvector (denoted by 𝑣), it is the same as scaling its eigenvalue 𝜆.

We now rearrange the equation.

Where I is the identity matrix, which has ones in the diagonal and zeros elsewhere. It has the same shape as A.


And the only way for this equation to be true.

This is only true when the determinant of the matrix (A -𝜆⋅I) becomes 0.
The determinant of a matrix is the factor by which the matrix scales the area in case of a 2×2 matrix and the volume in case of a 3×3 matrix. If the determinant is zero, then the matrix (A -𝜆⋅I) squeezes points to the origin (origin is the zero point). This is the only way for a non-zero vector to become a zero-vector.
So we search for all eigenvalues 𝜆, which make the determinant 0.
After we found the eigenvalues, we can solve this equation:

And we find the eigenvectors.
Covariance matrix
The variance-covariance matrix can be estimated from data using the following formula:
