This month: Facebook AI vs Brain signals 👍 , dendritis in artificial neural networks 🚨 , understanding illusions in retina dynamics 👁 ️

Deep recurrent Encoder: A scalable end-to-end network to model brain signals
Omar Chehab, Aleandre Defossez, Jean-Christophe Loiseau, Alexandre Gramfort, Jean-Remi King, Paper, Code
This paper comes directly from a collaboration between Facebook AI Research, the University Paris-Saclay and Ecole Normal Superieure. The main key of the paper is to devise a new method to help the neuroscience community to analyse brain responses to sensory inputs. In particular, the authors focused their attention on getting brain’s signals from reading activity, defining how the word length and the word usage frequency is reflected in the brain.
As a matter of fact brain recordings from human tasks are often very noisy and high dimensional. Just to give you an idea, a small blink of an eye can corrupt the signal recording, as well as heart beating needs to be taken into account, to avoid too much noise presence. Up today, the neuroscience community tried to decipher these recordings with linear techniques, such as Temporal Receptive Field (TRF) or Recurrent Temporal Receptive Field (RTRF), which, however, cannot cope with so many non-linearities arising from recorded signals.
To solve this challenge, authors propose a specific end-to-end deep learning architecture, trained to predict brain responses from multiple subjects at once (Deep Recurrent Encoding – DRE). The architecture is based on two modified long-short-term-memory (LSTM) blocks, stack on top of each other. The input data are encoded through a Convolutional layer and a ReLU function, then LSTM models sequence the hidden states, which are finally converted back to MEG activity estimates through a Convolutional Transpose 1D layers and ReLU activation function.
The model was tested on 68 subjects, who had an one-hour reading task while being recorded with a 273-channel CTF MEG scanner. The task consisted in reading approximately 2,700 words flashed on a screen in rapid series. 4 well-known features had central attention in this study: word length, word frequency in natural language, a binary indicator for the first and last words of the sequence.
Results prove that not only DRE is better at predicting brain responses, compare to classical TRF method, but DRE’s feature importance highlight which features are the most prominent ones in brain’s response. Fig. 1 shows the results from permutation importance of word length and frequency as a function of the spatial location from MEG scans. Word length peaked as a response after 150 ms in posterior MEG channels, while word usage frequency peaked at around 400 ms in fronto-temporal MEG channels. Furthermore, DRE was able to track an additional phenomenon, which is the lateralization of lexical processing in the brain. Indeed, for word frequency, peaked responses are present in both hemispheres, but with a significant amplitude for the left hemisphere in the frontal region.

Do biological constraints impair dendritic computation?
Ilenna Simone Jones, Konrad Paul Kording, Paper
Authors study the effect of dendritic non-linearities in artificial neural networks (ANN). The starting point is wondering if the dominating idea of modelling dendritic connections as linear integrators makes sense. As a matter of fact, dendrite are highly nonlinear, due to their voltage dependent ion channels. Three points are identified as pivotal to model more biological-like dendrities in ANN:
- Dendrities show a non-linear activation function, which resembles a Leaky Rectified Linear Unit (LReLU), but can be modelled in a more biological plausible way with a NaCaK function (sum of sodium, calcium and potassium voltage dependence);
- Usually dendritics inputs are 0/1, but they can be modelled as a conductance-based synaptic nonlinearity;
- Dendrities’ weight parameters can be both positive and negative, or modelled as non-negative values, due to the analogous to axial resistance between compartments.
These three points can be seen as a biological improvement for ANN (or constraints). These constraints are implemented in a binary tree model, modifying a classical k-tree algorithmic structure proposed in different papers, as depicated in fig. 2, and performance against 7 Machine Learning datasets is compared against a control 2-layer Fully Connected Neural Network (FCNN).

By comparing k-tree results with FCNN for MNIST, CIFAR-10, FMNIST, EMNIST, KMNIST, SVHN and USPS datasets, the authors delined the following conclusions:
- NaCaK activation function outperforms ReLU, LReLU and sigmoid activation function for MNIST, FMNIST, KMNIST and EMNIST, while reaches a similar performance for SVHN, USPS and CIFAR-10 datasets
- The synapse non linearity constraint, which nonlinearly maps the input of a dendrities to a realistic millivolt units, impacts on the k-tree performance, achieving an accuracy which is higher than FCNN for MNIST, FMNIST, KMNIST, EMNIST and attains the same level of accuracy for SVHN, USPS and CIFAR-10
- Non-negative weights add to the k-tree non-linear mapping synapses performs at the level or better than of the positive and negative k-tree with synapses. For MNIST, KMNIST and CIFAR-10 the model outperform FCNN.
In spite of these encouraging results, the current k-tree dendritic model has some limitations. First, the NaCaK function is an approximation, as in dendrities there are more than just 3 ion channels, although sodium, potassium and calcium are the most representative ones. Secondly, non-negative weights come as a result of the distribution of the ion channels throughout the entire dendritic morphology. Thus, a combination of more realistic morphologies than the k-tree and a learnable NaCaK function for each node might introduce more biologically relevant degrees of freedom, that could impact in the model computational performance. Finally, the k-tree structure present symmetries, which are not repeated in real nature. A further study on multi-synaptic input repetitions and how these input go to different dendritic sub-trees is necessary, to allow an architecture to be better trained. Overall, this study shows how important is the neurobiological constraint and how this should be reflected in the current ANN literature, in order to find better neural network architectures.
Information Synergy in the Anticipatory Dynamics of a Retina
Qi-Rong Lin, Po-Yu Chou, C. K. Chan, Paper
Our visual perception is prone to errors, which are known as optical illusions. For example the flash-lag phenomenon is a protection mechanism triggered by our retina from moving objects. At the same time, anticipation is a temporal illusion, which allows animals to perceive future events from past experiences. In this paper authors investigate and prove that retina’s anticipation can be describe as a negative-group-delay (NGD) phenomenon. NGD is a physical model, where "anticipations" are created by neural networks, based on delayed feedback of past perceptions.
To study such an effect, authors have run experimental and computational studies on a frogs’ retina. A small patch of retina from bullfrogs was cut and fixed on a 60-channel muti-electrode array (MEA), perfused and oxygenated up to 10 hours. Stochastic light stimulations x(t) were sent to the retina, through a LED illumination with an intensity I proportional to x. Retina’s responses were recorded by the MEA under stimulation with various frequencies fc (1,2, 3.5 and 5 Hz) and intensity time 0.5 s at 25 C.
Fig. 3 shows the average stimulations results for anticipatory effect for about 20 retinas. x(t) is the stochastic signal, r(t) is the retina firing rate, namely the electric-spikes retrieved from the stimulated retina. In particular, r(t) may contain the anticipatory information created by the retina as a result from the signal x(t) and its time modification ẋ(t). The retina’s response is analysed as mutual-information across the neural network, as a function of a time lag δt for an experiment where the stimulation frequency fc=1Hz (or 1 cycle per second), which is small enough for the retina to produce an anticipatory response. One can clearly see that the peak of I(r;x) is on the right of the origin (at 0.0 s), indicating that the responses (spikes) from the retina are anticipatory of x(t). Furthermore, an additional peak can be noticed for I(r,ẋ), again at the left of the origin, indicating an anticipatory behaviour. Fig.3(b) shows the decomposition of I(r; {x,ẋ}) through the partial information decomposition (PID), which further proves the presence of the anticipatory phenomenon.

Following, authors modelled the retina’s response with a computational linear model. This model results to be dependent on a parameter λ(t), which varies a a function of time. λ(t) seems to be dependent on the stimulation frequency fc and also on the properties of the ganglion cell, responsible for the firing of the "predictive" cells in the retina – namely cells which are triggering the anticipation event.
To summarize, this paper sheds light on the anticipatory events happening in the retina. This phenomenon comes from the recombination of information not present in the original stimulation from a group of cells. In particular, it seems there is a synergy between the input signal x(t) and its time modification ẋ(t) that gives rise to the anticipation of the retina. Thus, retinal circuit can somehow extracts information of ẋ(t) from x(t) and then recombine them to form a spike r.
I hope you like this review for March 2021 Neuroscience arxiv.org papers. Please, feel free to send me an email for questions or comments at: [email protected]