Introduction to ICA: Independent Component Analysis

Jonas Dieckmann
Towards Data Science
9 min readFeb 14, 2023

--

Have you ever found yourself in a situation where you were trying to analyze a complex and highly correlated data set and felt overwhelmed by the amount of information? This is where Independent Component Analysis (ICA) comes in. ICA is a powerful technique in the field of data analysis that allows you to separate and identify the underlying independent sources in a multivariate data set.

Image by Unsplash

ICA is important because it provides a way to understand the hidden structure of a data set, and it can be used in a variety of applications, such as signal processing, brain imaging, finance, and many other fields. In addition, ICA can help extract the most relevant information from data, providing valuable insights that would otherwise be lost in a sea of correlations.

In this article, we will delve into #1 the fundamentals of ICA by discussing what cocktail parties might have to do with it, #2 the 3-step-ICA-algorithm, and #3 how you can implement it in your data analysis projects. So, if you’re ready to unlock the full potential of your data, come along and join this journey!

#1: Introduction and main idea

Independent Component Analysis is one of the various unsupervised learning algorithms which means that we do not need to supervise the model before we can use it. The origin of this method comes from signal processing where we try to separate a multivariate signal into additive subcomponents. Let us jump into an explanation of the main idea:

Image by author

Imagine some independent signals or variables. These signals can be represented as signal curves, with the first signal located at the top and the second signal at the bottom within the above image. As a result of the measurements, we did not receive a data set with the signals themselves, but a data set that includes measurements of these two signals which are unfortunately mixed into distinct linear combinations. The objective of ICA is to recover the original, unknown signals by separating the mixed data. The ultimate aim is to reconstruct the data such that each dimension is mutually independent.

To make this concept more tangible, the most well-known example of ICA, the “cocktail party problem,” will be utilized.

Image by Unsplash

The cocktail party problem

Imagine attending a cocktail party where multiple individuals are speaking simultaneously, making it difficult to follow a single conversation. It is noteworthy that humans possess the ability to separate individual voice streams in such situations. Technically, this becomes slightly challenging.

Cocktail party problem. Image by author.

Suppose we record the conversations of two groups in the party using two microphones. This results in two mixed signals, with the first measurement having a greater influence of the first group and a lesser influence of the second group, while the second measurement has a greater influence of the second group.

Image by author

The general framework for this can be represented in vector notation here in the grey box. The measurements in vector X are actually the signals from vector S multiplied with some mixing coefficients, represented in matrix A. Since we want to extract the full conversations (original signals), we need to solve this for vector S.

Image by author

ICA vs. PCA

You have probably already guessed that ICA is in some way related to principal component analysis (PCA). This assumption is not so wrong. The ideas underlying both concepts are not that far apart, but they differ in the last stage, which we will see later.

Let’s summarise what PCA basically does: Suppose we have two variables that appear to be related. By maximizing the variances using the eigenvector and eigenvalues of these variables, we can convert them into principal components. In this particular example, PCA does a good job of identifying the principal direction of this relationship.

Let’s use the previous cocktail example as an example. In a very simple representation, we could imagine that the two measurements from microphones one and two have relationships that form something like a cross pattern. If we were to apply PCA in this case, we would get the wrong results, because PCA fails for data sets with more than one main direction.

Image by author

ICA, on the other hand, solves this problem by focusing on independent components instead of main components.

Image by author

It is important to recall the established conceptual framework. The readings obtained from the microphones correspond to the original signals that have been multiplied by the mixing matrix A. By rearranging the equation with respect to vector S, the only necessary information to determine the original variables is matrix A. However, matrix A is unknown.

Image by author

Hence, to gain a comprehensive understanding of matrix A and ultimately calculate the vector S, it is necessary to undertake inverse operations through a series of steps. These sequential inverse operations comprise the three stages of the ICA algorithm, which will now be analyzed in greater detail.

#2: Separation process | the 3-step-ICA-algorithm

Before proceeding to a practical demonstration in R, it is important to understand the three steps of the algorithm. The goal of the algorithm is to perform the multiplication of vector X with matrix A. Matrix A is comprised of three constituent parts, which are the result of multiplicative interactions between the different factors:

Image by author

Step 1: Find the angle with maximal variance to rotate | estimate U^T
The first component of the algorithm involves the use of the matrix U^T, which is based on the first angle Theta. The angle Theta can be derived from the primary direction of the data, as determined through Principal Component Analysis (PCA). This step rotates the graph to the position shown above.

Step 2: Find the scaling of the principal components | estimate ∑^(-1)
The second component involves stretching the figure, which is achieved through the Sigma^-1 step. This step employs the variances of sigma 1 and sigma 2 from the data, similar to the approach utilized in PCA.

Step 3: Independence and kurtosis assumptions for rotation | estimate V
The final component, which distinguishes the current algorithm from PCA, involves the rotation of the signals around angle Phi. This step aims to rebuild the original dimensions of the signals by utilizing the independence and kurtosis assumptions for rotation.

In summary, the algorithm employs measurements and performs rotation around theta, stretching through the use of variances sigma 1 and 2, and finally, rotation around Phi. The mathematical background of these steps has been summarized in the following slide for reference.

As you can see, we can determine the inverse matrix A using only the two angles and the variances of the data, which is actually all we need to process the ICA algorithm. Take the measurements, rotate, and scale them. And finally, we rotate them again to get the final dimension.

#3: Code examples with R using fastICA()

I hope you have understood the basic idea of the ICA algorithm so far. It is not necessary to understand every single step mathematically, but it is helpful to understand the concept behind it. With this knowledge, I would like to work out a practical example with you to show the practical application of the ICA algorithm using a function called fastICA in R.

# install fastICA package in R
install.packages("fastICA")

# load required libraries
library(MASS) # To use mvrnorm()
library(fastICA)

We create two random data sets: signal 1 and signal 2 which could be imagined as voice signals from our two cocktail groups:

# random data for signal 1 
s1=as.numeric(0.7*sin((1:1000)/19+0.57*pi) + mvrnorm(n = 1000, mu = 0, Sigma = 0.004))
plot(s1, col="red", main = "Signal 1", xlab = "Time", ylab = "Amplitude")

# random data for signal 1
s2=as.numeric(sin((1:1000)/33) + mvrnorm(n = 1000, mu = 0.03, Sigma = 0.005))
plot(s2, col="blue", main = "Signal 2",xlab = "Time", ylab = "Amplitude")
Screenshots from R-output: original signals. Image by author

The red curve stands for the first signal and the blue curve for the second. The shape does not matter in this case. What you should see is that both signals are different from each other. Let’s mix them now!

# measurements with mixed data x1 and x2
x1 <- sine1-2*sine2
plot(x1, main = "Linearly Mixed Signal 1", xlab = "Time", ylab = "Amplitude")

x2 <- 1.73*sine1 +3.41*sine2
plot(x2, main = "Linearly Mixed Signal 2", xlab = "Time", ylab = "Amplitude")
Screenshots from R-output: measurements. Image by author

As you can see above, we simulate two measurements using both signals. Therefore the signals within the measurements are not independent anymore. Both mixed signals can be imagined as the recordings of our two microphones in the cocktail example. We now forget about our two original signals and imagine, that these two measurements are the only information we have about this data.

Hence we want to separate them to finally get two independent signals:

# apply fastICA function to identify independent signals
measurements <- t(rbind(x1,x2))

estimation <- fastICA(measurements, 2, alg.typ = "parallel", fun = "logcosh", alpha = 1, method = "C", row.norm = FALSE, maxit = 200, tol = 0.0001, verbose = TRUE)

plot(estimation$S[,1], col="red", main = "Estimated signals", xlab = "Time", ylab = "Amplitude")
lines(estimation$S[,2], col="blue")
mtext("Signal 1 estimation in red, Signal 2 estimation in blue")
Screenshots from R-output: independent signals separated again. Image by author

The result of this algorithm is shown above. The red curve is the estimate of signal 1, while the blue curve estimates signal 2. And — no surprise — the algorithm has estimated almost the original signals, shown here on the right. You may have noticed that the red curve matches the expectation perfectly, while the blue curve appears to be inverted. This is because the algorithm cannot recover the exact amplitude of the source activity. But apart from that, the reconstruction has done a really good job here.

Limitation and conclusion

Let’s start with the bad news: ICA can only separate linearly mixed sources and further, we cannot separate perfectly Gaussian-distributed sources because they would kill the third step of our algorithm. While we expect independent sources that are mixed within linear combinations, the ICA would find a space where even not-independent sources are maximally independent.

But now the good ones: the ICA algorithm is a powerful method for different areas and is easily usable within open source packages for R; Mathlab and other systems. There are various examples, where ICA algorithms were used for applications: face recognition apps, predictions in the stock market, and many more. It is therefore an important and well-respected method in practical usage.

In a nutshell:
We introduced the Independent Component Analysis as an unsupervised learning algorithm. The main idea is to separate given measurements of linear combinations back into the original signals. This is called reconstruction and uses the three-step ICA algorithm. The most popular example to visualize the problem behind this method is again the cocktail party problem. But enough problems with cocktails for now.

Time for real cocktails parties 🍹

References

[1]: Bell, AJ; Sejnowski, TJ (1997). “The independent components of natural scenes are edge filters”. Vision Research. 37 (23): 3327– 3338. doi:10.1016/s0042–6989(97)00121–1. PMC 2882863. PMID 9425547. 

[2]: Back, AD; Weigend, AS (1997). “A first application of independent component analysis to extracting structure from stock returns”. International Journal of Neural Systems. 8 (4): 473–484. doi:10.1142/s0129065797000458. PMID 9730022. 

[3]: Barlett, MS (2001). Face image analysis by unsupervised learning. Boston: Kluwer International Series on Engineering and Computer Science. 

[4]: Comon, Pierre (1994). “Independent component analysis, A new concept?” Signal Processing, Volume 36, Issue 3,Pages 287–314, ISSN 0165–1684, https://doi.org/10.1016/0165-1684(94)90029-9.

--

--

team lead @ philips | passionate about data science, agile work & digital transformation