The world’s leading publication for data science, AI, and ML professionals.

Interpreting an LSTM through LIME

Learn how to interpret a Keras LSTM through LIME and dive into the internal working of the LIME library for text classifiers

Photo by Glen Carrie on Unsplash
Photo by Glen Carrie on Unsplash

An overview

  • I decided to write this blog since I couldn’t find many tutorials out there that teach how to use LIME with Keras models. Sure enough many articles focus on Sci-kit learn based models, but using LIME with Keras requires us to write an additional function as we will see.
  • Secondly, I would like to explain how LIME adapts to text classification problems which I was only able to understand when I went through the library’s code for text explainers (spoiler- it wasn’t straightforward).

Now coming to the content, here is what I will be covering

  1. Why do we need to interpret models
  2. What is LIME
  3. How LIME works for text
  4. Constructing an LSTM for a classification problem
  5. Interpreting the LSTM through LIME through fancy visual explanations

Why do we need to interpret our model?

Sometimes, especially in critical domains such as healthcare and finance, we want to know the reasoning behind our model’s predictions. When the margin for error is too narrow we want to be sure that our model is thinking like a human and making logical decisions. Simply getting the right answer is not enough, we want to know how our model found that answer. Say our classifier is predicting sentiment, in that case, we would like to be certain that the predictions are based on words that indicate sentiment, words like "happy" and "horrible" and not irrelevant words like someone’s name which do not contribute to sentiment. More importantly, interpreting our model can help reduce bias, for instance, we can see whether our model is biased towards someone’s race or even gender. Algorithms that help us interpret models fall under the wide category of Explainable Ai (XAI).

The intuition behind LIME

Firstly we categorise models based on interpretability. Models such as linear regressors or (small) decision trees are easily interpretable to humans. However, models such as neural networks and LSTMs have thousands of weights and many layers making it difficult for humans to interpret them.

Here is what LIME does to solve this problem. The following diagram is from the original LIME paper.

How LIME works, as mentioned in the original LIME paper [1]
How LIME works, as mentioned in the original LIME paper [1]

Consider the above to be a binary classification problem (red class and blue class)

  1. The curved red and blue regions indicate the decision space of our original model (which we shall call the black-box model)
  2. Say we want to interpret our model’s decision regarding the enlarged instance (denoted by the red plus). First, we create samples in and around this red instance.
  3. Now we weigh them according to their proximity to the instance we want to interpret and then generate predictions for these instances using our black-box model.
  4. Now that we have new locally synthesized data and labels we train an INTERPRETABLE (LIME uses the ridge regressor by default) model on this data. While training we give more importance to data points close to the instance we want to interpret
  5. Boom! we can now observe the weights of the trained model to gain insights about features (and their values) that influenced the black-box model’s prediction.

How does LIME work for text data?

  1. Given a sentence, we first construct a bag of words (BoW) representation for the sentence.
  2. Now the library randomly chooses words from the original sentence and manipulates the sentence to generate 5000 sentences with new word combinations/orders.
  3. These samples are weighted by how similar they are to the original sentence using the cosine distance.
  4. Now that we have new samples of vectorised sentences and we know their proximity, LIME follows the same process as mentioned in the above section.

Using LIME to interpret an LSTM

The dataset

We will work on the Yelp Coffee reviews dataset from Kaggle. I have preprocessed and cleaned the data and adapted it to a binary classification task. You can view the entire code here.

Here is what the dataset looks like after preprocessing and cleaning

The model

I have used an LSTM model with a hidden state of 100 dimensions, preceded by an embedding layer of 32 dimensions. You can see the model summary here.

Training and Results

After training the model for a mere 2 epochs we achieve very high accuracy on the training data. Moreover, we achieve similar accuracy on the testing data along with a very good F1-score which shows us that the high accuracy isn’t just a result of the data imbalance.

Interpreting the model using LIME Text Explainer

Firstly pip install lime

Now instantiate the text explainer using our class labels.

And for the most important part, since our Keras model doesn’t implement a predict_proba function like the sci-kit learn models we need to manually create one. Here is how you do it.

Simply use the preprocessing and tokenization steps and use the model’s prediction to return an array of dimensions (input size X num__target_classes [2 in our case] )

Results

An explanation for a given sentence (Image By Author)
An explanation for a given sentence (Image By Author)

Firstly we note that the actual rating is 2/5 which isn’t a great review. Our model correctly classifies the output with the probability of it carrying negative sentiment as 0.82. Now when we see the negative and positive words we can see that words like meh and pricing contribute to negative sentiment whereas words like nice, vegan and fancy contribute to positive sentiment. The weights you see here (for each highlighted word) are calculated using our local (interpretable) model.

Conclusion

We saw how to interpret Keras models using LIME through a custom predict_proba function. However, note that LIME isn’t without its flaws and as we can see little has been misclassified as positive and food has been classified as positive when it should not be highlighted at all. (At least in my opinion). LIME is still powerful enough to interpret simple problems such as this one and can even be used to generate global explanations (as opposed to interpreting each instance separately).

Check out my GitHub for some other projects. You can contact me here. Thank you for your time!

I’ll leave that for another article. If you liked this here are some more!

Locality Sensitive Hashing in NLP

Dealing with features that have high cardinality

Regex essential for NLP

Powerful Text Augmentation using NLPAUG!

References

[1] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144).

LIME code: https://github.com/marcotcr/lime


Related Articles