Independent Component Analysis via Gradient Ascent in Numpy and Tensorflow with Interactive Code

Jae Duk Seo
Towards Data Science
9 min readJul 14, 2018

--

GIF from this website

I wanted to know more about Independent Component Analysis, and I found out that there are actually multiple methods to compute this operation.

Please note that this post is for my deeper understanding of ICA.

Evolution of Computing Independent Component Analysis

As a starter there is the original ICA using the infomax principal, by Bell and Sejnowski. (Presented in 1991 / 1994 ).

Paper from this website

And there is the FastICA using a fixed-point algorithm. (Presented in 1997)

Paper from this website

And FasterICA from Inria. (Presented in 2017)

Paper from this website

One of the reason why I am so fascinated about ICA is because of it’s long tradition. As seen above, starting from 1991 to 2017, the method is being researched by varieties of researchers continuously. Finally, I am not exactly sure when this pdf was created but there is a method to perform ICA via stochastic gradient decent by Andrew NG.

Paper from this website

ICA with Gradient Ascent

Andrew Ng’s Gradient Ascent (left) Shireen Elhabian’s Gradient Ascent (Right) (source)

Now there isn’t a solid formula to follow when performing ICA using gradient ascent. In the case we do not know the sources’s densities, Professor Ng recommends us to use the Sigmoid function as cumulative distribution function, however Professor Elhabian used tanh function. (So I guess this really depends on case by case situation.).

I won’t go into the details of the gradient ascent methods, since I HIGHLY recommend the papers themselves, but in a summary.

Let x = As where x is a transformed data, A is the transformation matrix, and s are the original signal. What we want to do is estimate A so we can recover the original signal. And when we set A^-1 = W we can easily see where that W is coming from. Now lets actually take a look at how this method compares to FastICA.

2D Data Separation Using ICA

Lets first start with the most simple case, as seen above, we have two true independent source of data as a scatter 2D plot. Now after some transformation we have got the scatter plot that looks like on the right. And we can already see that for PCA the new basis vector are orthogonal to one another and they point where the most variance is within the data. However, for ICA, the new basis vectors are pointing in the direction of where the data are most independent from one another.

Before moving on to the results, below is the screen shot of the code that I have implemented. We are going to use FastICA, PCA, ICA with Sigmoid, and ICA with tanh.

And as seen above, at the final stages we can observe every method have (tried in their own way) to recover the original signals. Note for epoch I have used 10000 as total iteration number with learning rate of 0.00003.

Next when we plot a correlation matrix we can observe something like above, we can safely ignore everything except for the first two rows. (lets take a look at them closely).

We can observe that FastICA did a great job of decomposing the data into independent components. (Although the variables have flipped, that is not a huge problem.) For PCA, we can see that there are still some correlation value among different variables. And finally, using tanh function did a better job at decomposing the data.

When we plot the scatter plot of how each method did, we can clearly see that FastICA and ICA with tanh function did the best job at decomposition.

Wave Data Separation Using ICA

Now using the example from sklean ICA, lets see how gradient ascent ICA do for wave data. Again, we can see that for this example three distinct signals were combined together via some random transformation matrix A.

As seen above, lets first use the FastICA and PCA to recover the original signals.

To make things more interesting, lets use normal gradient update rule, as well as Adam Optimizer to update the weight W. (And see how they compare.) The above ICA gradient ascent method uses sigmoid function for CDF.

Finally, lets use tanh function as cdf, and see which performs better. Additionally, I have used tensorflow to implement the same technique as well.

And as seen above, when all of the decomposition are done, we can observe that all of the method produces their version of recovered signals.

Again, we can plot the correlation matrix to see how each of the method have performed.

And again, we only have to see the first three rows, clearly FastICA did the best job at decomposing signals. For PCA we can see that two of the signals were recovered (partially), but still have some correlation value among different signals. And for all the gradient ascent methods, we can see that tanh combined with normal gradient update rule produced the best result.

When we visualize the resulted signals, we can see that FastICA did the best job.

Interactive Code

For Google Colab, you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

For the code for 2D Data separation please click here.
For the code for Wave Data separation please click here.

Final Words

It was an honor to study the work of Andrew NG and Professor Elhabian, and since now we can directly via back prop, we can defiantly do something very very fun 😉

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also implemented Wide Residual Networks, please click here to view the blog post.

Reference

  1. pandas, R. (2018). Renaming columns in pandas. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas
  2. seaborn.heatmap — seaborn 0.8.1 documentation. (2018). Seaborn.pydata.org. Retrieved 14 July 2018, from https://seaborn.pydata.org/generated/seaborn.heatmap.html
  3. matplotlib?, H. (2018). How do you change the size of figures drawn with matplotlib?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib
  4. labels, M. (2018). Matplotlib color according to class labels. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/12487060/matplotlib-color-according-to-class-labels
  5. Index of /ml/machine-learning-databases/abalone. (2018). Archive.ics.uci.edu. Retrieved 14 July 2018, from http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
  6. pandas.DataFrame.plot.scatter — pandas 0.23.1 documentation. (2018). Pandas.pydata.org. Retrieved 14 July 2018, from https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html
  7. method, I. (2018). Inconsistency when setting figure size using pandas plot method. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/42215252/inconsistency-when-setting-figure-size-using-pandas-plot-method
  8. pandas.DataFrame.plot — pandas 0.23.1 documentation. (2018). Pandas.pydata.org. Retrieved 14 July 2018, from http://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.html
  9. xaxis, P. (2018). Pandas Dataframe line plot display date on xaxis. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/44213781/pandas-dataframe-line-plot-display-date-on-xaxis
  10. xticks missing for scatter plots with colors · Issue #10611 · pandas-dev/pandas. (2018). GitHub. Retrieved 14 July 2018, from https://github.com/pandas-dev/pandas/issues/10611
  11. numpy.linalg.inv — NumPy v1.14 Manual. (2018). Docs.scipy.org. Retrieved 14 July 2018, from https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.inv.html
  12. numpy.eye — NumPy v1.14 Manual. (2018). Docs.scipy.org. Retrieved 14 July 2018, from https://docs.scipy.org/doc/numpy/reference/generated/numpy.eye.html
  13. Blind source separation using FastICA — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 14 July 2018, from http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_blind_source_separation.html#sphx-glr-auto-examples-decomposition-plot-ica-blind-source-separation-py
  14. FastICA on 2D point clouds — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 14 July 2018, from http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html#sphx-glr-auto-examples-decomposition-plot-ica-vs-pca-py
  15. An overview of gradient descent optimization algorithms. (2016). Sebastian Ruder. Retrieved 14 July 2018, from http://ruder.io/optimizing-gradient-descent/index.html#adam
  16. FastICA on 2D point clouds — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 14 July 2018, from http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html
  17. Python?, H. (2018). How to calculate a logistic sigmoid function in Python?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/3985619/how-to-calculate-a-logistic-sigmoid-function-in-python
  18. matplotlib?, H. (2018). How do you change the size of figures drawn with matplotlib?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib
  19. numpy.arctan — NumPy v1.14 Manual. (2018). Docs.scipy.org. Retrieved 14 July 2018, from https://docs.scipy.org/doc/numpy/reference/generated/numpy.arctan.html
  20. numpy.arctan — NumPy v1.14 Manual. (2018). Docs.scipy.org. Retrieved 14 July 2018, from https://docs.scipy.org/doc/numpy/reference/generated/numpy.arctan.html
  21. values?, h. (2018). how to interpret numpy.correlate and numpy.corrcoef values?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/13439718/how-to-interpret-numpy-correlate-and-numpy-corrcoef-values
  22. Python, C. (2018). Calculating Pearson correlation and significance in Python. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python
  23. Correlation Coefficient: Simple Definition, Formula, Easy Steps. (2018). Statistics How To. Retrieved 14 July 2018, from http://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/
  24. scipy.stats.pearsonr — SciPy v1.1.0 Reference Guide. (2018). Docs.scipy.org. Retrieved 14 July 2018, from https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html
  25. Inria — Inventors for the digital world. (2018). Inria. Retrieved 14 July 2018, from https://www.inria.fr/en/
  26. (2018). Inf.fu-berlin.de. Retrieved 14 July 2018, from http://www.inf.fu-berlin.de/lehre/WS05/Mustererkennung/infomax/infomax.pdf
  27. Download FastICA. (2018). Research.ics.aalto.fi. Retrieved 14 July 2018, from https://research.ics.aalto.fi/ica/fastica/code/dlcode.shtml
  28. (2018). Cs.helsinki.fi. Retrieved 14 July 2018, from https://www.cs.helsinki.fi/u/ahyvarin/papers/NC97.pdf
  29. CS229: Machine Learning. (2018). Cs229.stanford.edu. Retrieved 14 July 2018, from http://cs229.stanford.edu/syllabus.html
  30. (2018). Cs229.stanford.edu. Retrieved 14 July 2018, from http://cs229.stanford.edu/notes/cs229-notes11.pdf
  31. arrays?, H. (2018). How to create a pandas DataFrame with several numpy 1d arrays?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/45399950/how-to-create-a-pandas-dataframe-with-several-numpy-1d-arrays
  32. plot?, H. (2018). How do I change the figure size for a seaborn plot?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/31594549/how-do-i-change-the-figure-size-for-a-seaborn-plot
  33. matplotlib?, H. (2018). How do you change the size of figures drawn with matplotlib?. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib
  34. heatmap annotation font size · Issue #430 · mwaskom/seaborn. (2018). GitHub. Retrieved 14 July 2018, from https://github.com/mwaskom/seaborn/issues/430
  35. heatmap, L. (2018). Lines to separate groups in seaborn heatmap. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/39352932/lines-to-separate-groups-in-seaborn-heatmap
  36. tf.eye | TensorFlow. (2018). TensorFlow. Retrieved 14 July 2018, from https://www.tensorflow.org/api_docs/python/tf/eye
  37. tf.global_variables_initializer | TensorFlow. (2018). TensorFlow. Retrieved 14 July 2018, from https://www.tensorflow.org/api_docs/python/tf/global_variables_initializer
  38. restarting, R. (2018). Reset an IPython kernel without restarting. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/35492914/reset-an-ipython-kernel-without-restarting
  39. People. (2018). Sci.utah.edu. Retrieved 14 July 2018, from https://www.sci.utah.edu/people/shireen.html
  40. (2018). Sci.utah.edu. Retrieved 14 July 2018, from http://www.sci.utah.edu/~shireen/pdfs/tutorials/Elhabian_ICA09.pdf
  41. matplotlib, M. (2018). Moving x-axis to the top of a plot in matplotlib. Stack Overflow. Retrieved 14 July 2018, from https://stackoverflow.com/questions/14406214/moving-x-axis-to-the-top-of-a-plot-in-matplotlib

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt