The world’s leading publication for data science, AI, and ML professionals.

Sentence Embeddings and CoreNLP’s Recursive Sentiment Model

Understand and implement CoreNLP's sentiment model.

Getting Started

Hello there! A couple of weeks ago I posted the first of a series of articles around the library Corenlp and more specifically its sentiment analysis model. This first article was an introduction to the Java package and its main features, particularly targeted at people that are used to working with Python like myself. As promised, the second article of the series will go more in depth into CoreNLP’s sentiment annotator: why it is not your usual sentiment classifier, the recursive model behind it and how to implement it with some simple Java script (which you can find on my github!).

I guess, before I start, I should warn the reader: ⛔️ this post talks very little about sentiment analysis **** and a lot about sentence embeddings ⛔️. Fear not, hopefully this will make sense to you as you read!

Photo by Jason Leung on Unsplash
Photo by Jason Leung on Unsplash

Let’s start by… googling CoreNLP Sentiment model! When clicking into the official page of coreNLP’s sentiment classifier, we find the following description

Most sentiment prediction systems work just by looking at words in isolation, giving positive points for positive words and negative points for negative words and then summing up these points. That way, the order of words is ignored and important information is lost. In contrast, our new deep learning model actually builds up a representation of whole sentences based on the sentence structure. It computes the sentiment based on how words compose the meaning of longer phrases. This way, the model is not as easily fooled as previous models.

After reading this paragraph, we can already tell two things:

  1. This is not a common sentiment predictor, but rather something a lot more interesting (and potentially a lot more effective!)
  2. The core difference between this and other sentiment models seems not to be the classifier itself, but rather the representation of the input text.

Point 2 already uncovers what is going to be the central theme of this post: text representation. In order to go in depth into why CoreNLP’s sentiment model is so powerful and effective, we will firstly need to understand the importance of appropriately representing input text. And this is what we are going to talk about below!

I will firstly introduce what the complexities of text representation (which I will call semantic modelling) are, as well as the limitations of well-known word embedding models such as word2vec. I will then talk about the concept of semantic compositionality and how it was used by coreNLP for creating a very powerful recursive sentiment analysis model. Finally, I will give an example of a simple implementation in Java. Let’s gooo!

Text Representation or Semantic Modelling

I like to think of semantic modelling like this: humans can read and understand a sequence of letters and symbols (like this sentence), however, ML algorithms can only understand numerical sequences. In order for a sentiment classifier, or any other model, to process text, it must be translated from a human-readable form to a computer-readable form.

Turns out there are maaany ways to represent text: as a bag of words, one hot encodings, logic-based frameworks, semantic vectors on an embedding space… And it is important to choose the best possible one, since this will impact directly the performance of your model. Think about it: how can we expect the model to classify the input text if it can not even understand it!

Out of all the semantic modelling techniques I have named above, the use of semantic vectors is regarded as one of the preferred text representation options in the NLP literature. In particular, word embeddings have become a very popular method and attracted a lot of attention in recent years. Think word2vec, GloVe, FastText

  • Word Embeddings and its Limitations

Word embeddings were built on the idea that it is possible to infer the meaning of a given word from its linguistic context (Mitchell and Lapata, 2010).

In this framework, words are basically represented by vectors that carry latent semantic information on the particular word (Socher, 2013). Figure 1 represents a simple semantic two-dimensional space. Note that words that are similar appear closer together.

Fig.1 (Mitchell and Lapata, 2010, Fig. 1, p.1390) Words represented in a semantic vector space. Proximity between them indicates semantic similarity.
Fig.1 (Mitchell and Lapata, 2010, Fig. 1, p.1390) Words represented in a semantic vector space. Proximity between them indicates semantic similarity.

However, despite their widespread use, the limitations of these models becomes quite evident when we are interested in computing the representation for a phrase or sentence. Word embedding models are only able to represent words in isolation and fail to account for the syntactical and grammatical associations between them.

One of the solutions that is more frequently used when we want to represent a sentence but we only have its word’s embeddings, is to average out its word vectors in order to obtain a sentence vector. This approach has proven to work well enough in some cases, but I personally find it very reductive since the specific syntax, grammar and dependencies of the sentence are just ignored.

  • Compositionality

As a person with a linguistics and literature background, considering syntax, grammar and word order when doing text analysis is something that I wouldn’t even question!! Having experienced the limitations of word embeddings myself, I came across the Principle of Semantic Compositionality (Frege, 1980) and started thinking about cool it would be to apply it to this semantic modelling task. This principle states that:

"the meaning of a (syntactically complex) whole is a function only of the meaning of its (syntactic) parts together with the method by which these parts were combined" (Pelletier, 1994, p.11).

In the NLP literature, the most common interpretation and use of Frege’s principle is as a theoretical ground which dictates that one should be able to explain the meaning of a complete sentence in terms of its phrases and that, similarly, it should also be possible to explain these phrases in terms of its words.

Therefore, when it comes to the representation of language, the modeller should be aware that each syntactic operation additionally implies a semantic operation (Mitchell and Lapata, 2010). Partee (1995, p. 313) formally suggested formula 1 for expressing the composition of two elements u and v, where f__ is the composition function acting on the two constituents and R accounts for the syntactic relationship between u and v** (Mitchell and Lapata, 2010).

Formula 1 (Mitchell and Lapata, 2010, p. 1393)
Formula 1 (Mitchell and Lapata, 2010, p. 1393)

Building on word embeddings, some authors have attempted to include different methods for compositionality with the aim of embedding aspects of language such as word order and syntax (Socher, 2013). An example of this is Richard Socher and his Recursive Model.

Recursive Neural Networks for Sentence Embeddings

I got super excited when I found out about Socher’s work because he was basically trying to incorporate this compositional idea onto his model for building a more complete form of sentence embedding. In order to do that he presented a new approach based on Recursive Neural Networks (RecNN).

This model is based on the idea of computing a sentence embedding starting from its simpler elements (i.e. the words), and then using the same composition function recursively in a bottom-up fashion. This way of breaking down the sentence and then building it up in a recursive, bottom-up approach would allow the final output to better capture complex information concerning semantics, syntax and sentiment of the sentence. And I find this so cool!

  • Model Overview

I will go now into giving an overview of the intuition behind the proposed method in Socher et al (2013). Figure 2 illustrates a simplified version of the compositional idea on which the recursive model is built. Throughout this section I will use this tri-gram as an example.

Figure 2 depicts the sentence and its internal elements as nodes of a parse tree. This kind of parse tree is said to be binary, because each parent node has only two child nodes. The basic elements of the RecNN model are the leaves of the tree, therefore the processing begins by splitting the sentence into words and computing the word embedding for each. In the example above, the first step is to compute the representations a, b and c for the words ‘not’, ‘very’ and ‘good’ respectively.

Fig.2 (Socher et al, 2013, Fig. 4, p.4)
Fig.2 (Socher et al, 2013, Fig. 4, p.4)

Subsequently, two of these word representations will be paired in order to compute a higher level phrase representation. The computation will be done by a composition function. In figure 2, _p_1 is computed by applying a composition function g to the word embeddings b and c.

This compositional method for pairing nodes will be repeated recursively, bottom-up, until getting to the root of the tree. In figure 2, the root node is _p_2, and it is computed by applying the composition to the word embedding a and the phrase embedding _p_1. The root node is the highest node in the tree, which usually represents the full sentence.

It is important to note that the composition function g is the same for both compositions in this example. Similarly, for longer sentences the composition function will always remain the same and will always take as input any pair of vectors. These vectors can represent any node at any level (e.g. a word, subphrase, phrase), but they must always have the same size. When combining the vectors b and c, the output of the composition _p_1 will also have the same dimensionality as b and c. Similarly, the output _p_2 of the combination of a and _p_1 will also have the same size. This is fundamental in order to allow for the recursive usage of the same composition function.

  • Sooo, where’s the sentiment?

You might be wondering where the sentiment is in all of this!

The sentiment is actually predicted at every node using a softmax classifier, which uses the node vector as input features. This is represented in figure 2 by the coloured circles that emerge from nodes c, p1 and p2.

Furthermore, the sentiment classification is multi-label, therefore the sentiment score will range between 0–4: 0 being very negative, 1 being negative, 2 being neutral, 3 being positive and 4 being very positive.

The sentiment predicted at the root node of the tree will be the final sentiment assigned to the particular sentence. In the example of figure 2 we can see that the root node has been classified as negative, therefore the whole sentence will be negative.

The Implementation

I will present now a very short extension of the script of the previous post in order to run some input text through the sentiment classifier and to get some metrics about the prediction output.

The first step would be to include parse and sentiment in our list of annotators (we need parsing in order to run sentiment analysis).

// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,parse,sentiment");

Having done that, we know that the input text now will be passed through the sentiment predictor, and therefore we just have to retrieve the results. We would firstly be interested in knowing what is the final sentiment score of a particular sentence (the prediction at the root node).

Tree tree = sentence.sentimentTree();
//get overall score
int sentimentScore = RNNCoreAnnotations.getPredictedClass(tree);
//print score to terminal 
System.out.println("Final score " + sentimentScore );

We do that in order to save the final score in a variable named sentimentScore. This number will always be either 0, 1, 2, 3 or 4.

Furthermore, we would like to know what are the probabilities that the predictor assigned a sentence of belonging to each of the classes. We obtained such information by doing:

SimpleMatrix simpleMatrix = RNNCoreAnnotations.getPredictions(tree);
//Gets probability for each sentiment using the elements of the sentiment matrix
float veryneg = (float)Math.round((simpleMatrix.get(0)*100d));
float neg = (float)Math.round((simpleMatrix.get(1)*100d));
float neutral = (float)Math.round((simpleMatrix.get(2)*100d));
float pos = (float)Math.round((simpleMatrix.get(3)*100d));
float verypos = (float)Math.round((simpleMatrix.get(4)*100d));

Probabilities will be stored in variables veryneg, neg, neutral, pos and verypos.

Let’s now run the whole file _coreNLP_pipeline3LBP.java to get an example output. We will use as input the following text in order to observe the changes in prediction: "This is a terrible sentence. I love this sentence so much! This is a normal sentence". This text is saved as _coreNLPinput2.txt. Run the script using the following command:

java -cp "*" coreNLP_pipeline3_LBP.java

One hand results will be printed onto the terminal such as in the screenshot below. We can observe that the scores assigned (the number right after "Final score" ) make sense for the sentences: negative, positive and neutral. We also see that the probabilities are consistent and add up to 100.

Terminal output
Terminal output

Furthermore, all results are printed onto a .txt document _coreNLPoutput2.txt that can be easily imported as a DataFrame in python using the command below. Resulting DataFrame will have 13 columns: ‘par_id’, ‘sent_id’, ‘words’, ‘lemmas’, ‘posTags’, ‘nerTags’, ‘depParse’, ‘sentiment’, ‘veryneg’, ‘neg’, ‘neu’, ‘pos’ and ‘verypos’.

import pandas as pd
df = pd.read_csv('coreNLP_output.txt', delimiter=';',header=0)
DataFrame created from output .txt file
DataFrame created from output .txt file

For next time…

That’s it for now! Hope you enjoyed it and that you got as excited about including syntax in a sentence vector as I got when I first came across this model! I feel that for literature majors like myself this is a very satisfying model to go through, since it’s built on actual linguistic foundations.

Next time we will continue talking about sentence embeddings! We would go through how to extract them from the coreNLP annotation object, we will compare them with other, more basic, sentence embeddings and explore their informativeness using some feature reduction and visualisation methods. We will also further use these vectors for compute more comprehensive document embeddings in order to perform sentiment analysis at the document level! ✌🏻

GitHub: https://github.com/laurabravopriegue/coreNLP_tutorial

Bibliography

Frege, G., 1980. The foundations of arithmetic: A logico-mathematical enquiry into the concept of number. Northwestern University Press.

Mitchell, J. and Lapata, M., 2010. Composition in distributional models of semantics. Cognitive science, 34(8), pp.1388–1429. Available at: https://onlinelibrary.wiley.com/doi/full/10.1111/j.1551-6709.2010.01106.x

Partee, B., 1995. Lexical semantics and compositionality. An invitation to cognitive science: Language, 1, pp.311–360.

Socher, R., Manning, C.D. and Ng, A.Y., 2010, December. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 deep learning and unsupervised feature learning workshop (Vol. 2010, pp. 1–9). Available at: https://nlp.stanford.edu/pubs/2010SocherManningNg.pdf

Socher, R., Lin, C.C., Manning, C. and Ng, A.Y., 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 129–136). Available at: https://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf

Socher, R., Huval, B., Manning, C.D. and Ng, A.Y., 2012, July. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1201–1211). Association for Computational Linguistics.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631–1642). Available at: https://www.aclweb.org/anthology/D13-1170

Overview

Introduction to Word Embedding and Word2Vec

Deeply Moving: Deep Learning for Sentiment Analysis


Related Articles