Entity-Level Tweets Emotion Analysis Dataset (ELTEA17) is a dataset for fine-grain emotion analysis of tweets which I made publicly available here. It is a sub-product of my research in 2017 about structured emotion prediction of tweets with co-extraction of cause, holder, and target.
ELTEA17 consists of 2034 anonymized tweets annotated with one of the six Ekman’s classes, along with their binary sarcasm annotation. In token-level annotation this dataset tags each emotion keyword with one or more Ekman classes plus tagging the holder, cause, and target of overall emotion. If you find this useful for your research or use case, head over to the ELTEA17 repository on GitHub
Use case
The massive amount of user-generated data from microblog services such as Twitter is a great source which is reflecting the public’s opinions and reactions to phenomena and situations from political and world events to consumer products. Since statistical and Machine Learning approaches have become methods of choice for constructing a wide variety of Natural Language Processing (NLP) applications, emotion analysis is becoming a popular topic in NLP research, which focuses mostly on user’s emotional reactions on social media.
A large variety of information processing applications deal with natural language texts. These applications require extracting and processing the textual entities, in addition to processing their surface forms. Applications such as sentiment analysis, opinion mining, and emotion analysis benefit greatly by having access to predefined textual entities. For example, aspect-based sentiment analysis aims to extract important entities, like opinion targets, opinion expressions, target categories, and opinion polarities. Just like aspect-based sentiment analysis, the extraction of textual entities such as holder, cause, and target of emotion can provide remarkable coverage over emotion-bearing tweets.
Emotion Taxonomy
The first issue in emotion analysis is to determine the taxonomy of emotions. Researchers have proposed a list of primary emotions for this purpose. For ELTEA17, I have adopted Ekman’s emotion classification which identifies six primary emotions, namely; happiness, sadness, anger, fear, surprise, and disgust. In order to perform fine-grained emotion classification, identifying holder, cause, and target along with emotion keywords can provide meaningful features for classifiers. This kind of information is extremely interesting since it can provide pragmatic knowledge about the emotional polarity of the tweets. For instance, the interaction between this triple and the overall emotional polarity of a tweet can give us the ability to answer; "Who has expressed the emotion?", "What is the emotion?", "To what is the emotion aimed to?" and "Why such an emotion rises?"
Dataset Construction
The ELTEA17 dataset has both sentence-level and token-level annotation. In the sentence-level annotation, each sentence is annotated with one of the six Ekman’s classes. In the token-level annotation, the annotation indicates each lexicon for their role meaning emotion holder, cause, and target, along with emotion keywords.
In a high-level ELTEA17 has gone through the following steps:
- Automatically obtaining random posts from Twitter considering emotion keywords, emotion hashtags, emoticons, length of texts, etc.
- Remove distractions like web links, geotags, duplicate posts, etc.
- Define annotation policy for emotion roles and emotion keyword labeling
- Define a policy when multiple emotion exists or when the representing emotion is ambiguous
- Manually annotate tweets w.r.t the emotion role and emotion keywords (token-level annotation)
- Manually annotate tweets into six Ekman classes of emotion (sentence-level annotation)
Data Collection
The goal in data collection is to capture data that are reliable for the target dataset. That is, the dataset must contain the tweets with specific keywords that convey emotion as well as linguistics cues for causality. Linguistics cues for causality do not necessarily output tweets with emotion-cause mentioned in them. Instead, they help us to collect tweets with general causality relations in them. Since emotion cause detection is a special case of general cause detection, this would be a helpful process to filter out unwanted data for annotation.
Twitter Data
For the purpose of this research, I have built a data collector job using Twitter’s basic API and tweepy. The script takes sets of keywords for filtering out unwanted tweets. Here is the detailed explanation of each filter:
- Language: Set in English for the whole ingestion process
- Emoticons: Predefined list of most frequently used emoticons such as :), :D, etc., with their respective sentiment. This list is adopted from an unpublished work by Marc Lamberti.
- Emotion Hashtags: Based on this study "93% of the tweets with a user-provided emotional hashtag, are emotion-relevant" I have adopted a minimal version of emotion hashtag list proposed by them, each hashtag corresponds to its Ekman class.
- Emotion keywords: I used the NRC emotion lexicon list. This list contains 3462 lexica, each classified into Plutchik classes.
- Causative linguistic cues: In order to have more data on which cause of emotions is explicitly mentioned, I have constructed a list of causative linguistic cues, such as because, due to, etc.
- War-related keywords: After sample analysis of data, I have noticed most of the collected data are biased toward happiness because of e.g. happy birthday messages. So I decided to collect with a list of keywords that convey war and political conflict, such as attack, invade, troops, etc.
To have a better distribution over topics, the data collection process was done for the duration of almost one month, and each time for a different time of the day. For each run of the data collector, all of the aforementioned keywords mixed with no keyword were used. In this diagram, you can see the distribution of collected data based on the usage of filters.

The script returns each tweet data as a python dictionary with keys such as "created at", "id", "text", etc., Since this data contains lots of unnecessary information for the dataset, in the next step I trimmed away undesired information from this content. After extracting hashtags, emojis, mentions, links, and emoticons each record looks like this:
{
"id":9025500993849∗∗∗∗∗,
"text":"Josh's Family Must Be so Proud of him ... ,
"datetime":"Tue Aug 29 15:14:46 +0000 2017",
"links":[],
"emojis":[],
"mentions":["@∗∗∗∗∗"],
"emoticons":[],
"hashtags":[],
"filter":"CE",
"num_words":n,
}
Filtering
The overall number of collected tweets surpasses 1 million. Eventually, each record has been stored in a SQLite database, for having faster access, applying queries for further record selection and cleanups. For dealing with duplicate tweets, I set the text field of the tweet database as a unique identifier, which automatically refuses to store duplicates. The overall records of the tweet table are almost 500,000. The following snapshot shows the database schema. Let’s check some of the queries applied for further filtering:

# selects random tweets where they have hashtag and the filter which has been used for ingestion was emotion hashtags (EHT)
SELECT * FROM tbl_tweets WHERE (hashtag != 'F' AND filter = 'EHT' ) ORDER BY RANDOM()
# selects random tweets where the filter which has been used for ingestion was causative keywords
SELECT * FROM tbl_tweets WHERE (filter = 'CE' ) ORDER BY RANDOM()
# selects random tweets where they have emoji
SELECT * FROM tbl_tweets WHERE emoji != 'F' ORDER BY RANDOM()
# selects LIMIT number of tweets where the length field is less or equal to 20, and the filter which has been used for ingestion was 'war' keywords
SELECT * FROM tbl_tweets WHERE (filter = 'war' AND length <= 20) ORDER BY length DESC LIMIT
# selects tweets where they have hashtag and the filter which has been used for streaming was NRC emotion lexicon, and no link exists in tweets, and the length of tweets is between 5 and 35 tokens
SELECT * FROM tbl_tweets WHERE (hashtag != 'F' AND filter = 'NRC' AND link = 'F' AND (length BETWEEN 5 AND 35))
The motivation behind limiting the length of tweets is that longer tweets contain more contextual information for machine learning purposes. It is also better to have fewer tweets with links because those are mostly advertisements, which contain less opinion content.
Combining the outcome of all queries results in 3348 tweet instances that contain enough emotion keywords and mention of the cause per tweet.
Annotation
Sentence-Level
One of the main purposes of this dataset is to be used for feeding text classifiers. Two sets of tags were considered for sentence-level annotation. Ekman emotion classes as well as binary annotation for sarcasm. This annotation does not require a specific platform, and it has been done simply using a text editor. The following criteria have been considered for annotation of each tweet:
- Only tweets that express a certain type of Ekman classes are annotated.
- When multiple emotion exists, the dominant one has been selected.
- If the emotion is too hard or vague to decide, the tweet is discarded.
This process leads to successfully annotating 2034 tweet instances. Here you can see the distribution of annotated data in sentence-level, over six classes:

Token-level
In this section, first I describe the linguistic phenomenon in emotional expressions. Then I explain the details of the annotation scheme. Another purpose of this dataset is identifying holder, cause, target, and emotion keywords from tweets in a way that it can be easily fed into sequence labeling algorithms such as HMM, CRF, or LSTM.
Emotion Keywords
In written text, there might be keywords, which are used to express emotion. In the context of the emotion-role, finding the appropriate class of emotion keywords in the annotation is the prerequisite to identify its role. Annotation focuses on explicit emotions in which they are often expressed by keywords such as "shocked" in "I was shocked after hearing about his death". The presence of keywords does not necessarily convey emotional information due to different possible reasons such as sense ambiguity. For example, "wishes" is a word of happiness in "He wishes for good weather". It can also be the name of a song in a different context. Thus the annotation should be done by considering the context of tweets.
The granularity level for emotion keyword annotation is a lexical unit. It can be single words or short phrases. For each entry, the keywords are annotated with their respective Ekman class. Since more than one emotion may be presented in an entry; the annotators were allowed to select multiple appropriate classes (maximum 3 classes). For example, "outrage" clearly belongs to the anger category. On the other hand, "terrorism" depending on the context, can be labeled with sadness, anger, or fear. A lexical unit could be classified into the following categories:
- One of six emotions: The annotators indicate the emotion they think is more appropriate (if there is one).
- Multiple classes at the same time: The annotators indicate the emotion with a combination of multiple classes, which they think is closer to the emotion of the lexical unit (Note that combinations of more than 3 Ekman classes are not allowed).
- None of the six emotion classes: The annotators choose one of the three suggested classes anticipation, trust, and other.
Emotion Cause
According to most theories, an emotion is generally invoked by an external event. Causality is a semantic relation defined as "The relationship between cause and effect", for identifying the cause of an emotional state, the annotators should be able to answer the question: "why the holder feels that emotion?"
In text, an emotion cause is considered as a proposition that evokes the presence of the corresponding emotions. It is generally assumed that a proposition has a verb that optionally takes a noun occurring before it as the subject and a noun after it as the object. However, a cause is sometimes expressed as a nominal. As emotion-cause detection is a special case of general cause detection, the annotators can adopt some typical linguistic patterns such as: because, thus, as a result, therefore, due to and etc., as well as some other linguistic cues which potentially indicate emotion-causes, such as causative verbs like get, have, make, let. The annotation of emotion-cause requires two basic constraints:
- The explicit constraints qualify a single prominent emotion-cause that is directly involved with the emotional expression (can be easily detected using causative linguistic cues).
- The implicit constraints qualify all direct and indirect emotion-causes (the cause can be inferred from the meaning).
While annotating the emotion-cause, if some specific cause exists, the annotators should consider each emotion keyword corresponding to the cause. An emotion keyword can sometimes be associated with multiple causes, in such a case, all causes are marked. Note that, the presence of emotion keywords does not necessarily guarantee the existence of emotion cause neither. Tweets without causes that are explicitly expressed are mainly due to the following reasons:
- The tweet is too short, thus there is not enough contextual information, for instance: "I am angry and bloated".
- The tweet is sufficiently long, but the cause may be beyond the context.
- The cause is obscure, which can due to high abstraction.
Emotion Holder
The source of a tweet is the author. The source of an emotional state is the person whose emotion is being expressed. When a tweet is emotion-bearing, the author of a tweet might be a holder, because the author might have expressed his/her emotion in the post, but the author may also write about other people’s emotions, leading to multiple sources in a single sentence.
Emotion holder identification is a sub-domain of semantic role labeling research, it is reasonable to use general linguistic patterns for identification. I defined the emotion holders as a phrase with the label holder for the corpus construction, whether it is implicit or explicit expression of emotion. Instead of limiting the emotion holder to be a person, it can be any entity that expresses emotion. The emotion holders are usually the noun phrases and sometimes prepositional phrases. For instance, in the following sentences, the span shown in boldface denotes the emotion holder.
Jeff felt very happy. Jenny felt happy ** about Jeff’**s happiness.
In the second example, "Jeff" should be also identified. Thus, there are two emotion holders in one sentence.
Emotion Target
Emotion target is the entity that the emotion is about. More specifically they are entities and their attributes on which emotions have been aimed. In tweets, the emotion target is quite diverse since there is a large range of different topics: named entities and noun phrases that are the object of emotion. Within subjective texts, emotion targets tend to be accompanied by emotion keywords. For example, in "I hate the rainy weather", rainy weather is the target of the emotion. One might discuss, that rainy weather can be interpreted as a cause of the emotional state. Sometimes, emotion expressions can have overlapping emotion-cause and emotion target in an informal text like tweets. This is why this dataset even incorporates cause extraction with target identification to improve performance [3]. Emotion targets are important because without knowing them, the emotions expressed in a tweet are of limited use.
There are linguistic relations between emotion keywords and targets due to the fact that emotion keywords are used to modify targets. In subjective expressions the emotion holders, emotion keywords, and emotion targets are correlated with the subject, modifying the direct object.
Once a holder has a particular emotional state, which may be described in terms of a specific cause that invokes it, and a topic that categorizes the target of emotion. There can also be a circumstance in which the response occurs or a reason why the cause evokes the particular response in the holder, but it is not considered in the annotation. In other words, any kind of information beyond emotion keywords, emotion cause, and emotion target is not considered in the annotation process. Here is the final distribution of annotated data in token-level:

Training Dataset Format
For sequence labeling problems, the "industry standard" encoding is the BIO encoding. It subdivides texts in tags as either being outside of an entity (O-X), beginning of an entity (B-X), or continuation of an entity (I-X). In ELTEA17, BIO representation is dividing the text into overlapping groups of words, so-called text chunks. Therefore for each group of labels, a different BIO representation is needed. Specifically, one BIO encoding per six emotion classes plus three emotion roles, which leads to nine layers of BIO encoding. The table below shows the annotation representation in BIO encoding in different layers. Note that the token killing has been annotated with both fear and sadness.

Data Statistics
Let’s take a look at some preliminary statistics over ELTEA17. The table below shows the overall summary of the corpus. For evaluation I have used 5-fold cross-validation, the value of fold has been set up by applying randomized search algorithm.
Next, let’s take look at some statistics over different entities, table below shows the proportion of tweets where emotion cause, holder, and target have been mentioned. For mention of cause, the highest number belongs to surprise category, interestingly the authors who expressed their surprise feeling, about a situation or phenomena, they are more willing to bring the reason of their emotion. For mention of target, the lowest number belongs to fear category, meaning in the context where fear has been expressed, the holders are less willing to mention the entity in which the fear has been aimed to. Lastly, for mention of holder, the lowest number belongs to the category anger.
The next table shows the maximum, minimum, and average length of the tweets w.r.t each of their classes.
Annotation Tool
There are many tools that can provide a manually annotating environment with entities and relationships. General Architecture for Text Engineering or GATE is a well-established open-source suite of tools for NLP tasks, originally developed at the University of Sheffield. It provides a collaborative annotation environment for semantic annotation and ontological data modeling. Brat rapid is an online environment for collaborative text annotation. It is designed in particular for structured annotation. WebAnno is a web-based annotation tool for a wide range of linguistic annotations including various layers of morphological, syntactical, and semantic annotations. In this tool, custom annotation layers can be defined. Besides, it supports multiple users for collaborative annotation projects.
My choice for the annotation tool was eventually WebAnno and the sole reason was that it makes it easy to build annotation in different layers. The figure below shows the working environment of WebAnno.

This tool can export the annotation in a variety of file formats, among all TSV 3 is the easiest to convert to BIO encoding. The exported file includes a header and a body section. The header section includes information about the different types of annotation layers and features used in the file. Below you can see the header marker of WebAnno TSV 3 file along with an example annotation.
#FORMAT=WebAnno TSV 3.1
#T SP=webanno.custom.Emotion|Emotion|Role
#Text=I need a tutor because chemistry exam is killing me
305–1 33289–33290 I _ _
305–2 33291–33295 need _ _
305–3 33296–33297 a _ _
305–4 33298–33303 tutor _ _
305–5 33304–33311 because _ _
305–6 33212–33321 chemistry *[3364]|cause[3364]
305–7 33322–33326 exam *[3364]|*[3365] cause[3364]|target[3365]
305–8 33327–33329 is _ _
305–9 33330–33337 killing sadness[3366]|fear[3367] *[3366]|*[3367]
305–10 33338–33340 me * holder
Layers are marked with "#" followed by "T SP=" for span types, and features are separated with " ". Sentences are presented following the text marker "#Text=". Token annotation starts with a sentence-token number followed by the begin-end offsets and the token itself, separated by TAB characters. Here for the first token "I", 305 indicates the sentence number, 1 indicates the token number and 33289 is the beginning offset of the token, and 33290 is the end offset of the token For every feature of a span, annotation value will be presented in the same row as the token/sub-token annotation, separated by a TAB character. If there is no annotation for the given span layer, "" character is placed in the column. If the feature has no annotation or if the span layer does not have a feature at all, a "*" character represents the annotation. For the token "me", the asterisk means that the token is associated with no emotion type, since the first layer is dedicated to emotion types, and the role of the token is associated with holder. Multiple span annotation on a token will have a numbered reference enclosed by brackets as [N] where N refers to the nth annotation on the layer.
Next Steps
- Stay tuned to learn how I combined CRF and CNN for emotion classification
- Feel free to explore ELTEA17
Further Reference
[1] Plutchik, Robert: Emotion: Theory, research, and experience. In Theories of Emotion, vol. 1. Academic Press, New York, NY, USA., 1980.
[2] F. Ren, H. Shi, A general ontology-based multi-lingual multi-function multimedia intelligent system (2000) conference proceedings. 2000 ieee international conference on systems, man and cybernetics.
[3] Chen, Ying, Sophia Yat Mei Lee, Shoushan Li and Chu-Ren Huang Emotion cause detection with linguistic constructions (2010) In Proceedings of the 23rd International Conference on Computational Linguistics Pages 179–187
[4] Weiyuan Li, Hua Xu Text-based emotion classification using emotion cause extraction (2014) Expert Systems with Applications 41(4):1742–1749
[5] https://webanno.github.io/webanno/releases/3.2.2/docs/user-guide.html