The world’s leading publication for data science, AI, and ML professionals.

Using Brain Computer Interfaces & EEG Signals to Classify Emotions

What can we learn from brainwaves?

Image by Josh Riemer on Unsplash.com
Image by Josh Riemer on Unsplash.com

Neuroscience and EEG Signals

An electroencephalogram (Eeg) is a recording of the brain activity measured by electrodes. EEG signals were first recorded in 1924 by Hans Berger², an incredible discovery that has lead to an area of research that is still being heavily researched today with a lot of unknowns. The collection of EEG signals is non-invasive and the electrodes are placed on the scalp with gel or paste. The most common use of EEG signals for medical reasons include Epilepsy research and sleep studies. They are also used to discover brain injuries, brain inflammation, and strokes.

DEAP Dataset

The dataset was created by Queen Mary University of London and can be accessed at https://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html. The dataset is open for public use and requires signing a release form.

The dataset consists of:

32 Participants 40 Channels (The first 32 channels are EEG) 40 One-minute videos Labels: Valence, Arousal, Dominance, Liking, Familiarity, Order EEG Channel Names- ‘Fp1’, ‘AF3’, ‘F3’, ‘F7’, ‘FC5’, ‘FC1’, ‘C3’, ‘T7’, ‘CP5’, ‘CP1’, ‘P3’, ‘P7’, ‘PO3’, ‘O1’, ‘Oz’, ‘Pz’,’Fp2′, ‘AF4’, ‘Fz’, ‘F4’, ‘F8’, ‘FC6’, ‘FC2’, ‘Cz’, ‘C4’, ‘T8’, ‘CP6’, ‘CP2’, ‘P4’, ‘P8’, ‘PO4’, ‘O2’

Project Motivation

Although EEG signals are routinely used for medical practice, this research is focused on if and how EEG signals could be used for more subjective approaches, such as human emotions and sentiment. By aligning the participant’s surveys about the music videos (subjective) as well as the EEG data (objective) we can begin to understand if it is possible to predict emotions from EEG signals.

Visualizations and Signal Processing Python Library

The python library predominantly used in this research is MNE-Python¹, an open-source python package that analyses human neurophysiological data including MEG, EEG, and other signals.

Sensor Locations

Electrodes, which are small metal disks, are placed strategically on the scalp according to the 10/20 rule of placement. The placement of these electrodes are measured according to fixed locations on the subject’s head. Below we can see the sensor locations used for the Biosemi system used in this dataset’s experiment.

Sensor Locations on Head for Biosemi32 System (Image by Author)
Sensor Locations on Head for Biosemi32 System (Image by Author)

The Biosemi system in this experiment is a 32-channel system that is used in research settings and not for medical usage. The even numbers refer to the right side of the head and the odd numbers refer to the left side. The naming conventions of the sensors relies on the location of the sensor:

Raw, Unprocessed EEG Signals (Image by Author)
Raw, Unprocessed EEG Signals (Image by Author)

F – Frontal lobe T – Temporal lobe C – Central lobe P – Parietal lobe O – Occipital lobe

Processing EEG Signals

The purpose of preprocessing signals is to improve the signal-to-noise ratio as well as detecting experimental effects. This is done with band-pass filtering (this passes frequencies within a certain range while rejecting frequences outside of that range). Also during preprocessing, dead channels are dropped and artifacts are removed.

Marking Bad Channels

It is important to mark bad channels, that is channels that are malfunctioning, unused, or not showing any signal, in order to exclude them from the analysis of the signals. We can see in the first plot below that there is at least one bad channel. The error message tells us that the channel is GSR2. We will mark this as a bad channel and redo the plot to see if we can see the other channels better. We also know from the original dataset description that channels "Erg1" and "Erg2" are also not used channels so we will mark them as bad as well. We can see in the third plot below that the unused channels are not included anymore.

(Image by Author)
(Image by Author)

We also know there are other non-EEG channels (for example, galvanic skin response) in the dataset so we will exclude those as well since we won’t be using them in the models. Now that we have identified all of the unused and non-EEG channels, we can set the montage. This will give a location to the sensors and refer them to the system the data was collected with (Biosemi32). In the final plot above, we can see only the EEG channels that have color associating the signals with their sensor locations.

Power Spectral Density Plot (Image by Author)
Power Spectral Density Plot (Image by Author)

Filtering data through desired passband

Most useful information in the brainwave will exist under 30hz. We can see in this power spectral density plot that the frequency drops off somewhere between 30-40hz anyways so we will cutoff at 30hz for the purposes of our research.

Detecting Artifacts and Removing with ICA

It is important to visualize and observe the artifacts in the data before deciding which method to choose for repairing the artifacts. There are three types of artifacts that disturb EEG data. These include environmental (power lines, doors slamming, elevator noises, cell phones, air conditioning, etc), instrumentation (poor scalp connection, electromagnetic interference), and biological artifacts (heartbeats, blinking, swallowing).

(Image by Author)
(Image by Author)

First we will remove the SSP (Signal Space Projection) projectors from the data. This is a matrix multiplication that reduces the rank of the data by projecting it to a lower dimensional subspace.

It is important to detect the artifacts (ocular and heartbeats) in order to determine if they are significant enough to need to be repaired and to also determine which tool to use to repair them. We will be using the Independent Component Analysis (ICA) which attempts to decompose a multivariate signal into independent non-Gaussian signals.

Eye blink artifacts manifested across the different channels (Image by Author)
Eye blink artifacts manifested across the different channels (Image by Author)

Finding Events and Epochs

Once the signals are processed, we are able to identify the events within the signals. Events are marked by a stimulus channel. For this dataset, the stimulus channel was called "Status". The Status channel has 7 event markers. These mark events such as the start of the experiment, when the music starts, when there is a fixation screen, the end of the experiment,etc. These event markers can slice the EEG signal to create epochs, which are specific time windows that are extracted from the continuous EEG signal. The first image below shows all of the event markers in this one sample of data. The second image is a small window of time with event markers overlaid on top of the signal.

Plotted events in one sample of data (Image by Author)
Plotted events in one sample of data (Image by Author)
Plotted events overlaid with EEG data (Image by Author)
Plotted events overlaid with EEG data (Image by Author)

Rejected Epochs

While we are searching for events, we are able to perform another way of rejecting data that signifies an eye blink (which as we remember is an artifact). The graph below is very interesting because we can see that the sensors in the frontal lobes had the largest percentage of rejected data. This aligns with common sense that the frontal lobes are closest to the eyes where the blinks occur and therefore create the strongest artifact.

Percentage of Rejected Epochs (Image by Author)
Percentage of Rejected Epochs (Image by Author)

Machine Learning – Supervised

Several classifying models were tested on the dataset including KNN, Decision Tree Classifier, Random Forest Classifier, and Bagged Trees. A GridsearchCV was also performed to hypertune the parameters.

Target label: Participant familiarity with video (Image by Author)
Target label: Participant familiarity with video (Image by Author)

Conclusions and Moving Forward

Unfortunately there are some disadvantages to EEG signals. EEG signals have poor spatial resolution (about 10cm2 from the scalp) resulting in intense interpretation to understand what areas of the brain are activated by a particular response. As we can also see from this research, EEG signals are also difficult to process. Therefore, moving forward, one way to improve the results of this research is further processing and testing of the signals. In addition, we can see in our Machine Learning models that we only experimented with "Familiarity" but could do a regression problem to address valence, arousal, and dominance.

Additionally, the KNN model performed the best initially of all of the models but none of the models performed incredibly well due to the small size of the dataset (1 participant sample). Because there are 32 participants in the study, it would be interesting to interpret all of the EEG signals into a larger machine learning model.

References

  1. A. Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier, C. Brodbeck, R. Goj, M. Jas, T. Brooks, L. Parkkonen, M. Hämäläinen, MEG and EEG data analysis with MNE-Python,Frontiers in Neuroscience, Volume 7, 2013, ISSN 1662–453X, [DOI]
  2. Haas, L F (2003). "Hans Berger (1873–1941), Richard Caton (1842–1926), and electroencephalography"1738204). Journal of Neurology, Neurosurgery & Psychiatry. 74(1): 9. doi:10.1136/jnnp.74.1.9. PMC 1738204. PMID 12486257.
  3. K.R. Scherer, "What are emotions? And how can they be measured", Social Science Information,vol. 44, no. 4, pp. 695–729, 2005.
  4. S. Koelstra, C. Muehl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, "DEAP: A Database for Emotion Analysis using Physiological Signals", IEEE Transactions on Affective Computing, Special Issue on Naturalistic Affect Resources for System Building and Evaluation, in press

Related Articles