[ Only Numpy ] Having Fun with Eigen Value s/ Vectors with Interactive Code in Numpy

Jae Duk Seo
Towards Data Science
8 min readJul 10, 2018

--

GIF from this website

Principal Component analysis, Singular Value Decomposition, and Independent component analysis are all dimension reduction techniques. And all of them depend on eigen values and vectors. Today, I wanted to go beyond the high level API’s and into the details.

Please note that this is post is purely myself explaining these concepts to myself, so this post might be little bit different.

Why do we even need Eigen Vectors / Values? (This part is garbage)

Image from this website

We can actually think every matrix, even medical record csv as a transformation matrix. The way I think about this is bit strange, for example, we can have some data like shown below.

Data from Sklearn Wine

The data set is for classification task so there is a target value that is not shown in the matrix above. But when we plot a 2D scatter plot for the above data we can get something like below.

As seen above, we can think of each data point as a transformation of the basis vector. (Which is [1,0] of od280/od315_of_diluted_wines or [0,1] of flavanoids.).

Co-Varience Matrix / Correlation Matrix

Image from this website

Now before going on, understanding the concept of Co-Varience as well as Correlation matrix is essential. So please take a few minutes to review the materials. If you want to see the numpy way of calculating these stuffs please click here.

Example of Changing Basis Vectors / Sample Data

Lets just do something very simple, the red dots represent the eigen values for the given two blue data points. Lets set that value as the new basis vector and project our two blue data points.

As seen above, we can see that now our data have been flipped. If you wish to see the code on how to achieve this please see below.

Now lets take a look at the data set that we are going to play with.

Data from Sklearn Wine

You already have seen this graph above, since we are going to use the wine data set from sklearn. However, we are not going to use all of the dimension rather we are only going to use two of the most highly correlated attributes.

Change of Basis using Co-Varience Matrix

Left Image → Calculated Co-Varience Matrix using Numpy
Right Image → Calculated Co-Varience Matrix using Built in Function

Now first thing is first, we need to calculate the co-varience matrix, and from here we can get the eigen vector and value of the generated co-variance matrix. (This is principle change vector.)

Red Dot → Eigen Vectors for Co-Varience Matrix

And as seen above, after getting each vector, we can make a linear plot, The length of the line is not important. But we can see that the region where most variance exist is diagonal of the data points.

Finally, we can project all of the data points, into this basis vectors. Unfortunately, we lost the attribute information. However we are able to see a clearer version of the data.

Change of Basis using Correlation Matrix

Left Image → Calculated Correlation Matrix using Numpy
Right Image → Calculated Correlation Matrix using Built in Function

Now I know the colors are off, but the values are the same as seen below.

And as same as last time, lets find the eigen vectors for this matrix.

The resulted vectors are symmetrical to one another, well the correlation matrix themselves are symmetric.

Nevertheless, above is the resulted projection when we use the generated eigen vectors from correlation matrix.

Singular Value Decomposition (2D)

Now since we have gotten this far, lets go further. We can easily perform singular value decomposition. As seen above, after first calculating Transpose(A) dot product (A). We can find the eigen vectors of that matrix. And from there we can get our U. Finally, as seen above in the Red Box, we can observe the fact that there is no difference between the original matrix and the reconstructed matrix. (If you don’t understand this methods please click here.)

Now lets drop the least significant singular value, to perform dimension reduction. And as seen above now, we can clearly see that the reconstructed matrix is not the same as the original matrix.

And when we project our data into 2D plane, we can observe that after the SVD, the most variation in a diagonal line is captured.

Singular Value Decomposition (3D)

Finally, lets finish off by having fun with some 3D plots, as seen above, I have added one additional attribute, ‘alcalinity_of_ash’.

Left Image → Calculated Co-Varience Matrix using Numpy
Right Image → Calculated Co-Varience Matrix using Built in Function

Again, lets first see what we can do with just the change of basis. And when we plot the generated eigen vectors from the co-variance we get something like below.

I altered the starting point little bit, but the story stays the same. Now lets perform the projections.

We can observe the fact that now our eigen vectors have taken the basis vector space. Now they are all perpendicular (orthogonal) to one another. Additionally, we can perform SVD for the 3D data as well.

Just like previous example, we can see that the reconstructed matrix is same as the original matrix. Now for the dimension reduction.

When we drop the least significant singular value, our 3D data collapse into a plane. However. we still can observe the fact that the data are still quite separable from one another.

But when we drop two singular values, we can clearly see that our data is no longer separable since it collapsed to a line.

Interactive Code

For Google Colab, you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

To access the Code for this post please click here.

Final Words

It was actually really fun to play around with eigen values/vectors.

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also implemented Wide Residual Networks, please click here to view the blog post.

Reference

  1. Eigen values and Eigen vectors in 3 mins | Explained with an interesting analogy. (2018). YouTube. Retrieved 9 July 2018, from https://www.youtube.com/watch?v=5UjQVJu89_Q
  2. sklearn.datasets.load_wine — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 9 July 2018, from http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html#sklearn.datasets.load_wine
  3. numpy.linalg.inv — NumPy v1.14 Manual. (2018). Docs.scipy.org. Retrieved 9 July 2018, from https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.inv.html
  4. correct, e. (2018). eigenvectors created by numpy.linalg.eig don’t seem correct. Stack Overflow. Retrieved 9 July 2018, from https://stackoverflow.com/questions/32926861/eigenvectors-created-by-numpy-linalg-eig-dont-seem-correct
  5. Inverse of a Matrix. (2018). Mathsisfun.com. Retrieved 9 July 2018, from https://www.mathsisfun.com/algebra/matrix-inverse.html
  6. matplotlib?, H. (2018). How do you change the size of figures drawn with matplotlib?. Stack Overflow. Retrieved 9 July 2018, from https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib
  7. Naive Classification using Matrix Dot Product / Change of Basis with Interactive Code in Numpy. (2018). Towards Data Science. Retrieved 9 July 2018, from https://towardsdatascience.com/naive-classification-using-matrix-dot-product-change-of-basis-with-interactive-code-in-numpy-4808e5aa955e
  8. seaborn.heatmap — seaborn 0.8.1 documentation. (2018). Seaborn.pydata.org. Retrieved 9 July 2018, from https://seaborn.pydata.org/generated/seaborn.heatmap.html
  9. matplotlib, C. (2018). Change y range to start from 0 with matplotlib. Stack Overflow. Retrieved 9 July 2018, from https://stackoverflow.com/questions/22642511/change-y-range-to-start-from-0-with-matplotlib
  10. value?, H. (2018). How to use a decimal range() step value?. Stack Overflow. Retrieved 9 July 2018, from https://stackoverflow.com/questions/477486/how-to-use-a-decimal-range-step-value
  11. (2018). Users.stat.umn.edu. Retrieved 9 July 2018, from http://users.stat.umn.edu/~helwig/notes/datamat-Notes.pdf
  12. color example code: colormaps_reference.py — Matplotlib 2.0.2 documentation. (2018). Matplotlib.org. Retrieved 9 July 2018, from https://matplotlib.org/examples/color/colormaps_reference.html
  13. Customizing plots with style sheets — Matplotlib 1.5.3 documentation. (2018). Matplotlib.org. Retrieved 9 July 2018, from https://matplotlib.org/users/style_sheets.html
  14. Singular Value Decomposition (SVD) Tutorial: Applications, Examples, Exercises. (2017). Stats and Bots. Retrieved 10 July 2018, from https://blog.statsbot.co/singular-value-decomposition-tutorial-52c695315254

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt