The world’s leading publication for data science, AI, and ML professionals.

Non-circular PCA and Neural Decoder

Introduction to PCA and neuroscience

What to expect in this article:

  1. Neuroscience and PCA
  2. Complex-valued PCA
  3. Simulation of neural signals with complex numbers
Photo by Morgan Housel on Unsplash
Photo by Morgan Housel on Unsplash

  1. Introduction to PCA and neuroscience

PCA (principal component analysis) is the common statistical technique in many signal processing applications. PCA is the technique whose purpose is to condense data of a large set of variables into a few variables that create linear combinations of the data set. After applying PCA on the data set of signals, the abandoned vector spaces often contain noises so that PCA can act as denoising. On the other, the latent vectors contain the most variances of the data set, which suggests that the true signal subspace is estimated from the recorded signals. By projecting the data set onto the estimated signal subspace, thus throwing away the less influential vector spaces, the denoised data is obtained by PCA. Since its application for dimensionality reduction and noise reduction of the signal data, PCA is widely used in speech enhancement, speech modeling, and more other signal processing situations.


Speaking of the limitations of PCA, it is originally made for real-value data, so it had been made for using it to analyze time series of signals. However, in using the Fourier or Laplace transforms, a signal is represented by a complex function of the frequency: any wave of the signal at certain frequencies is given by a complex number. Thus, the original PCA is not suitable for analyzing such transformed data since it is insufficient to thoroughly describe the complex random vectors. Unlike real random values, the covariance between two random complex variables involves the complex conjugate of the original. Complex matrix has one more covariance which is known as pseudo-covariance matrix or relation matrix. This matrix is rarely introduced in signal processing because it is assumed to be zero. This covariance between the original data and the transposed data captures how Z is correlated with exp(j alphaZ) for any alpha. The circularity is visualized in the second figure and explained more in the Circularity section of methods. By definition, data is circular/proper when the pseudo covariance is zero: otherwise, data is non-circular/improper. Therefore, the improper/non-circular method of PCA is essential not to miss the uniqueness of complex data rather than proper/circular method which ignores the important characteristics of the complex data.


As a student researcher in a computational neuroscience lab, I had studied the new method of PCA because we always seek for better ways to analyze their neural data. When we try to understand the function of the brain with the analogy to the computer, the decoder should distinguish zeros and ones from the neural recordings. PCA method then denoise and makes the model less prone to over-fitting for decoder by dimensionality reduction. On the other hand, the local field potentials(LFPs), which are represented by complex numbers in frequency domain, have advantages over other signals because of the low power consumptions compared to that of spiking activity. Especially, the development of electrophysiological devices such as Brain-computer interface (BCI) is closely related to the development of decoder to have better control of robotic limbs with good decodability and less power consumption.


  1. Signal Simulation
Figure by the author, Virtual Sensors and Signals Sources in Simulation
Figure by the author, Virtual Sensors and Signals Sources in Simulation

This simulation is to determine how the decoder with circular PCA and the decoder with non-circular PCA do with complex data with different circularity. In this study, the signal is simulated by three virtual sources and transmitted to five different virtual sensors. Therefore, the signal matrix has the size of 3 by trials, and the sensor matrix with noise has the size of 5 by trials unless the number of sensors is noted. Each signal population follows Gaussian distribution and contains hit and miss groups with two different means. (blue and red dots in the signal) The two groups can be estimated as 0s and 1s in the neural function or the signal from an earth’s crust when a huge earthquake occurs or the stationary crust. The signal is simulated with T as the total number of signal samples. Then, the latent matrix considers the distance and direction from a source to a sensor. The direction from the source to the sensor is defined so that the latent vector is defined for each sensors. The white Gaussian noise is added to each trial. Then, the task of the model is to nd the latent matrix an by PCA and distinguish these hit and miss trials from the big feature matrix, with the size of number of sensors by trials.


  1. Circularity
Figure by the author, Circular and Noncircular Data with Gaussian Distribution
Figure by the author, Circular and Noncircular Data with Gaussian Distribution

When generating circular and noncircular signals, the correlation between the real and imaginary parts of the data is altered to have distinct circularity difference (very circular or very noncircular). That is because the correlation of two parts is equivalent to the correlation between any complex number, Z, and its rotated form, exp(jalphaZ) for any alpha. The more non-circular the data is, the more correlation between the real and imaginary parts have. The circular data has the shape of a circle when plotted on the real and imaginary axis, and the non-circular data has the shape of a football (left and right plot in the figure). When the data is decoded (the detailed process in the following section), the non-circular PCA method should acknowledge additional information of the data which is not by the proper PCA method. Therefore, it may improve the performance of the decoder (explained in the following section) compared to the normal/proper PCA.


  1. Decoder Cross-validation

Simulated data is split into the train and test set. The train set is used to have the eigenvectors, U, of the estimated signal subspace by using the code provided by Li. Uc is the result of using normal PCA that does not take non-circularity, and Unc is the obtained matrix that considers the non-circularity of the data. Then, the training features are projected onto the estimated signal subspace, and the multinomial t for the logistic regression catches the best t to distinguish the hit and miss trials in each Uc or Unc subspace. The obtained logistic regression model is then validated by calculating the area under the curve (AUC), which indicates the fraction of correct decoding, on the test set. The testing data set is projected onto each Uc or Unc subspace obtained by the corresponding training set. For each simulated data set, cross-validation is done 10 times with 90% of the train set and 10% of the test set. The average AUC is submitted as the output of the decoder function.


  1. Simulation with Different Conditions

When developing the decoder, it is necessary to have a decent decodability. And there are many variables that alter the decodability, such as the number of sensors that detect the signal, the amplitude of noise, the sample size, and the separation of hit and miss trials. The amplitude of noise is quantified by the signal-to-noise ratio (SNR) in decibel. With these variables that possibly change the decodability, the data with low or high circularity coefficients are simulated, run a decoder 30 times independently for each condition, and averaged as the final performance index. The result is obtained as the series of AUC as the function of a varying number of sensors, varying sample sizes, varying noise, and varying separation.


So far…

Complex data has an additional feature to consider for PCA, and we need to take that feature into account when decoding the complex data with PCA method. The previous research provided us with the novel PCA method that considers the unique characteristics of complex data. In this project, I designed the simulations for neural signals to evaluate the viability of novel and regular PCA methods under different conditions of the signals.


  1. Simulation with Different Conditions

As mentioned in the methods (In Part I), the area under the curves (AUCs) as a function of varying conditions is calculated with the simulated data. In addition to this first performance index of the decoder, the second index of the normalized distance between the true signal subspace (U_true) and the estimated signal subspace (U) is also calculated for each condition (Eq. 6). Then, the distance is displayed in base-10 logarithmic in the graph. This index should be smaller or equal for the ones given by the non-circular method than the ones from the circular method. For all plots, the red diamond shape indicate the result of using non-circular PCA (Unc) and the blue circle do so the result of using normal PCA (Uc) for both AUCs and normalized distances. And the simulation is done for four different variations of conditions.


6.1 Variation of Category Separation

First, the simulation is done for the varying separation of hit and miss categories. The result is in the figure. The separation is noted by the difference in hit and miss, and it goes 0.1 to 2 with monotonic increase in arbitrary unit (10 points in total). The variance is the same for both categories (1 for both). As seen in the upper part, AUC goes higher as the separation gets bigger. Intuitively, it makes sense because the model can decode precisely if the two categories are far apart. However, my point is how the Uc (circular) and Unc (noncircular) method perform for circular and noncircular data. For circular data, the AUCs of Uc method is significantly lower than that of Unc method for 1.58, 1.79, and 2 (Wilcoxon rank-sum test, p<0.05). Also, the distance is bigger in the Uc method for 1.58, 1.79, and 2. For the noncircular data, the distance of Uc method is significantly lower for except 0.1 and 0.52 in separation. However, AUC of Uc method is significantly lower for only 1.79 of separation. Since the distance metric for the noncircular PCA is smaller than the distance. metric for circular PCA (for mostly big separation), the Unc methods surely captures the difference in circularity of the data, but the performance of the decoder is the same for both the Uc and Unc PCA methods.

Figure by the author, Variation of Category Separation
Figure by the author, Variation of Category Separation

6.2 Variation of Number of Sensors

The simulation is done for varying number of sensors, and the result is the figure. The numbers of sensor are ranging from 3 to 13 (11 points in total). For the circular data, there is no significant difference on both distances and AUCs between the Uc and Unc methods (Wilcoxon rank-sum test, p<0.05). For the noncircular data, AUCs for the Uc and Unc methods are not significantly different. The distances in the Unc method is significantly shorter than that in the Uc method for except 6, 7, 8 sensors. Similar to the previous varying separation, the distance metric for the noncircular PCA is smaller for some sizes of dimensions, but the decodability is affected.

Figure by the author, Variation of Number of Sensors
Figure by the author, Variation of Number of Sensors

6.3 Variation of Sample Size

The simulation is done for varying sample size, and the result is Figure 5. The numbers of trials are ranging from 200 to 2000 (21 points in total). For the circular data, the distance of Unc method is significantly smaller than that of Uc method for 290 and 380 trials (Wilcoxon rank-sum test, p<0.05). But, AUC is no different for all sample sizes. For the noncircular data, 200–650, 1010–1190, 1370–1820, and 2000 trials have significantly shorter distance in the Unc method. However, AUC on noncircular data is not significantly different as well as the circular data. The decodability is not changed with this condition as well as the category separation and the number of sensors.

Figure by the author, Variation of Sample Size
Figure by the author, Variation of Sample Size

6.4 Variation of Noise Amplitude

The simulation is done for varying noise amplitude, and the result is the figure. The amplitudes of noise are ranging from -4 to 8 dB with monotonic increase (20 points in total). The distance decreases as SNR increases. This matches to the simulation in the previous study. [2] For the circular data, the distance in the Unc method is significantly smaller in -2.11 dB (Wilcoxon rank-sum test, p<0.05). On the other hand, the distance in Unc method for noncircular data is significantly smaller than that in Unc method with -0.21 to 3.58 and 4.84–6.11 dB. However, AUC makes no difference between the Uc and Unc method for both the circular and noncircular data. Again, the decodability is not changed even though the Unc method estimates the subspace better than the Uc method since the distance is shorter.

Figure by the author, Variation of Noise Amplitude
Figure by the author, Variation of Noise Amplitude

6.5 Multiple Variables

The combinations of two conditions from the previous results are tested as well. The variation of sample size and noise amplitude are combined to analyze the effect of non-circularity. (Fig.1 and Fig.2) The rows of the heatmap are for the number of sample size and the columns are varying noise amplitudes [dB]. And the simulation is done only once for each set of two conditions. For both AUC (Fig. 1) and distance indexes (Fig. 2), the top two maps are the simulation with circular data and the bottom two are that with non-circular data; the left two are the result of using circular/proper/Uc PCA method and the right two are that of using non-circular/improper/Unc PCA method. As Figure 1 shows, the circular data should and does have the same decodability with the Uc and Unc methods, but the non-circular data does not have any apparent differences either. This matches to the result of the simulation with a single condition, which indicates that the varying sample size and noise amplitude does not have significant difference on these two methods when the data is non-circular. On the other hand, the distance indexes have the same effect as the simulation with each varying condition: the more sample size, the closer the subspace gets, and the bigger signal-to-noise ratio, the closer the subspace gets. Not only every heatmap in Figure 2 follows that pattern, but the Unc method has better estimated subspace on non-circular data as seen in the bottom two maps. The size of dark blue is bigger on the Uc method than the Unc method, which means the distance of estimated subspaces by the Unc method is relatively shorter. Overall, the split of dark and light blue is linear in the distance indexes, and therefore, the effect of sample size and noise amplitude on the simulation seems independent.

Figure 1 by the author, AUC with Variation of Sample Size and Noise Amplitude
Figure 1 by the author, AUC with Variation of Sample Size and Noise Amplitude
Figure 2 by the author, Distances with Variation of Sample Size and Noise Amplitude
Figure 2 by the author, Distances with Variation of Sample Size and Noise Amplitude

  1. Discussion

In this research, the subspace of the noncircular data is better estimated by the noncircular PCA method in all conditions, which is reasonable since the non-circular method considers the second-order non-circularity of the non-circular data [5], is better than the circular method at dealing with the non-circular data, and consistent with the simulation in the previous research [2]. However, there are no single factors that can affect the performance of a decoder using the improper/non-circular PCA. Regardless of variations in the category separation, the number of sensors, the sample size, and the noise amplitude, the estimated subspace can distinguish the two categories of noncircular data for the decoder with the same accuracy for both the circular and non-circular methods. Moreover, the combination of each condition, at least the variation of sample size and noise amplitude does seem to have the independent effect on the distance between the true and estimated subspace.

Overall, the simulation successfully visualized the effect of non-circularity on decoding with different conditions of the simulated data. However, the independent run for each condition is done only 30 times. In order to reduce the effect of outliers, more than 100 independent runs should be sufficient to infer more precise analysis of non-circularity. Moreover, the simulation in this research has a limited number of signal sources, whereas the real feature matrix of multielectrode recordings should be more than 20 dimensions. Therefore, further simulations are necessary to grasp the effect of non-circularity when decoding complex neural data.


References:

[1] N. Even-Chen, D.G. Muratore, S.D. Stavisky, and et al. Power-saving design opportunities for wireless intracortical brain computer interfaces. (2020) Nat Biomed Eng.

[2] X. Li, T. Adali, and M. Anderson. Noncircular principal component analysis and its application to model selection (2011). IEEE Transactions on Signal Processing, 59(10):4516–4528.

[3] David A. Markowitz, Yan T. Wong, Charles M. Gray, and Bijan Pesaran. Optimizing the decoding of movement goals from local field potentials in macaque cortex (2011). Journal of Neuroscience, 31(50):18412–18422.

[4] C. Mehring and et al. Inference of hand movements from local eld potentials in monkey motor cortex (2003). Nature Neuroscience, 6:1253–1254.

[5] B. Picinbono. Second-order complex random vectors and normal distributions (1996). IEEE Transactions on Signal Processing, 44(10):2637–2640.

[6] A. M. Sykulski and D. B. Percival. Exact simulation of noncircular or improper complex-valued stationary gaussian processes using circulant embedding (2016). IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietri sul Mare, pages 1–6.


Related Articles