Naive Classification using Matrix Dot Product / Change of Basis with Interactive Code in Numpy

Jae Duk Seo
Towards Data Science
6 min readJul 5, 2018

--

GIF from this website

Yesterday I played around with scalar projection as well as dot product, now I wish to take this step further into matrix dot product/change of basis, to again perform simple classification, using the sklearn breast cancer data set.

Please note that this post is for my deeper understanding of linear algebra.

Simple/Quick Review about Matrix Dot Product

Image from this website

So the above image captures the mathematical operation of matrix dot product, however it does not cover the intuition of a dot product, which is extremely powerful. For more reading about the intuition behind dot product please visit my old blog post.

Change of Basis

Image from this website

Most of our coordinate systems are in basis vector of [1,0] and [0,1] in two dimensional case. However, it does not always have to be like that, we can use a simple change of basis to convert the reference point of a vector. In data domain, I understand this as combination of attributes.

Classifying Fake Data Set

Suppose we have cluster of data as seen above, there are multiple methods of performing classification, but I am only going to use two vector and matrix dot product to perform classification. Now, we can see that when we have a vector [1,0] to capture how much each data have component axis y. (In data domain this can be any attributes such as heart beat or blood pressure etc…) and we can use vector [0,1] to capture how each data point contains the axis x again this can be any attribute as well.

As seen above, we can simply take the maximum argument of the resulted matrix for each data point. Data point that have more [1,0] will have max argument of 0 and the other will have max argument of 1. (Or vice versa I did not check but the logic stays the same.)

And when we plot a scatter plot, using the captured max argument as colours, we can see that it have been classified well.

Naively Classifying Breast Cancer Data Set

Now lets take a look at a real world data use case, above is the generated scatter plot when taking the 2 most highly correlated attributes with the outcome of breast cancer. (which are ‘worst concave points’,’worst perimeter’, the correlation matrix is seen below. )

Target is the outcome

Now when we take the exact same approach to have one classification vector to capture how much information does a data point have regarding worst concave points and the other for worst perimeter. And taking the max argument we get something like below.

The reason why this graph gets created is because the boundary line we are making is something like below.

And we can observe the two pink classification vector we have chosen.

Work Around 1 on Classifying Breast Cancer Data Set

One method of working around this and getting a better result, (although not perfect) is to use a single vector point, and performing scalar projection, and have a threshold to make the boundary line.

This is not the best solution, but its a better solution.

Work Around 2 on Classifying Breast Cancer Data Set

The second method is simply changing the basis vector, as seen above when we flip all of the data points respect to y axis we can get something like above. In other words we changed the basis vector to [-1,0] and [1,0]. The x axis is now negated. (using the code below.)

Now we can simply take the exact same approach as we did in step one.

As seen above, by taking dot product of two vectors, [0,0.5] and [1,0] and taking the maximum arguments we can get a much better results.

Interactive Code

To access the code for this post please click here.

Final Words

So much these powerful stuffs are hided from the term Machine learning, working with high level API’s are fun. But they don’t give you the insights.

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also implemented Wide Residual Networks, please click here to view the blog post.

Reference

  1. How to Multiply Matrices. (2018). Mathsisfun.com. Retrieved 5 July 2018, from https://www.mathsisfun.com/algebra/matrix-multiplying.html
  2. sklearn.datasets.load_breast_cancer — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 5 July 2018, from http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer
  3. headers?, C. (2018). Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?. Stack Overflow. Retrieved 5 July 2018, from https://stackoverflow.com/questions/20763012/creating-a-pandas-dataframe-from-a-numpy-array-how-do-i-specify-the-index-colum
  4. Controlling figure aesthetics — seaborn 0.8.1 documentation. (2018). Seaborn.pydata.org. Retrieved 5 July 2018, from https://seaborn.pydata.org/tutorial/aesthetics.html
  5. Plotting a diagonal correlation matrix — seaborn 0.8.1 documentation. (2018). Seaborn.pydata.org. Retrieved 5 July 2018, from https://seaborn.pydata.org/examples/many_pairwise_correlations.html
  6. matplotlib?, H. (2018). How do you change the size of figures drawn with matplotlib?. Stack Overflow. Retrieved 5 July 2018, from https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib
  7. pandas.DataFrame.plot.scatter — pandas 0.23.1 documentation. (2018). Pandas.pydata.org. Retrieved 5 July 2018, from https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.scatter.html
  8. method, I. (2018). Inconsistency when setting figure size using pandas plot method. Stack Overflow. Retrieved 5 July 2018, from https://stackoverflow.com/questions/42215252/inconsistency-when-setting-figure-size-using-pandas-plot-method
  9. Visualization — pandas 0.23.1 documentation. (2018). Pandas.pydata.org. Retrieved 5 July 2018, from https://pandas.pydata.org/pandas-docs/stable/visualization.html
  10. possible?, P. (2018). Python using lambda to apply pd.DataFrame instead for nested loop is it possible?. Stack Overflow. Retrieved 5 July 2018, from https://stackoverflow.com/questions/19178762/python-using-lambda-to-apply-pd-dataframe-instead-for-nested-loop-is-it-possible
  11. Naive Classification using Scalar Projection with Interactive Code. (2018). Towards Data Science. Retrieved 5 July 2018, from https://towardsdatascience.com/naive-classification-using-scalar-projection-with-interactive-code-298279afb11f
  12. mplot3d example code: scatter3d_demo.py — Matplotlib 2.0.0 documentation. (2018). Matplotlib.org. Retrieved 5 July 2018, from https://matplotlib.org/2.0.0/examples/mplot3d/scatter3d_demo.html
  13. palettes?], H. (2018). How to color `matplotlib` scatterplot using a continuous value [`seaborn` color palettes?]. Stack Overflow. Retrieved 5 July 2018, from https://stackoverflow.com/questions/39735147/how-to-color-matplotlib-scatterplot-using-a-continuous-value-seaborn-color
  14. numpy.dot(). (2018). www.tutorialspoint.com. Retrieved 5 July 2018, from https://www.tutorialspoint.com/numpy/numpy_dot.htm
  15. git add, c. (2018). git add, commit and push commands in one?. Stack Overflow. Retrieved 5 July 2018, from https://stackoverflow.com/questions/19595067/git-add-commit-and-push-commands-in-one
  16. markers — Matplotlib 2.2.2 documentation. (2018). Matplotlib.org. Retrieved 5 July 2018, from https://matplotlib.org/api/markers_api.html
  17. Plot randomly generated classification dataset — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 5 July 2018, from http://scikit-learn.org/stable/auto_examples/datasets/plot_random_dataset.html
  18. Change of Basis — HMC Calculus Tutorial. (2018). Math.hmc.edu. Retrieved 5 July 2018, from https://www.math.hmc.edu/calculus/tutorials/changebasis/

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt