Rise of Modern NLP and the Need of Interpretability!

Capabilities and Challenges of Modern NLP towards Analysability, Understandability, Transparency, Explainability and Vulnerabilities.

Keyur Faldu
Towards Data Science

--

At Embibe (AI platform for learning outcomes), we are leveraging modern NLP to solve problems like content ingestion, knowledge graph completion, smart meta tagging, question generation, question answering, concepts summarisation, conversational assistants for students, vernacular academic translation, evaluation of descriptive answers, etc. Applying modern NLP for real world applications demands interpretability, to make the system more transparent, explainable and robust. Let’s look into Rise of Modern NLP and the Need of Interpretability!

Modern NLP is at the forefront of computational linguistics, which is concerned with computational modelling of natural language.

Interpretability: Sun + Rain => Reflection, Refraction and Dispersion => Rainbow. (Photo by Karson on Unsplash)

Chomsky’s apprehension on the potential of Computational Linguistics during the 1950s, specifically on the theoretical foundation of those statistical models, was something analogous to Einstein’s reaction to Quantum Physics, God does not play dice”. These are pivotal moments when the world witnessed the rise of alternative theories. However, by all means, the foundation laid by Chomsky for linguistics theory still remains relevant and aid in progress, analysis, and understanding of computational linguistics.

“It’s true there’s been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success … which I think is novel in the history of science. It interprets success as approximating unanalyzed data.” — Noam Chomsky

He mentioned, the notion of success is not a success. Well, the lacunae could be the theoretical foundations, but empirically, it could be thought of as the “Interpretability”, which accounts for analysability, transparency, accountability, and explainability of these computational models.

The major advancement of Computational linguistics is attributed to three subsequent phases: statistical modeling, classical machine learning, and deep learning. These phases are increasingly complex for interpretability.

Statistical modeling dealt with statistical analysis and inference from data, and it got predictive power in the form of machine learning. There are three important aspects of solving problems using Machine Learning,

  • Designing Input Features.
  • Deriving Features Representation.
  • Architecting Model Internals.

Classic ML techniques have always given a sense of control as features were explicitly specified, and mostly, driven from human intuition. Features representation that used to be aggregative and statistical in nature, was also under the realm of interpretability, i.e. Tf-Idf based vector representation, etc. ML models like decision trees, logistic regression, support vector machines, or other parametric models were also easy to reason with. Extensions of these models became complex with the use of techniques like non-linear kernels, ensembles, boosting, etc to further improve performance. However, it was still possible to understand model internals.

Continuous efforts to improve performance on classic NLP tasks like named entity recognition, sentiment analysis, classification, etc and the constant push of adding increasingly complex tasks such as question answering, summarization, machine translation, etc have attracted increasing attention from the research community.

The rise of modern NLP is attributed to the evolution of a simple model, perceptrons. Extension of perceptrons was not just a second order with techniques like ensembles or boosting, but rather exponential if not asymptotic, with the advent of deep neural networks.

“I am convinced that machines can and will think in our lifetime.” — Oliver Selfridge (The Thinking Machines — 1961).

A look back into the journey of the tiny perceptron turning into the deep learning tsunami would mark a few important milestones. To mention a few, the birth of Perceptron in 1958 coupled with research foresight of “Thinking Machines” in the 1960s, followed by the invention of backpropagation in the 1980s, and empowered by proliferation of data coupled with super compute capabilities in early 2010s. All of these have compounded the chemistry of millions of perceptrons interacting with each other, and hence the rise of Deep Learning, and Modern NLP.

Naturally, deep learning has revitalized computational linguistics; latent statistical patterns learned using neural mechanisms gave an incredible performance. Only to reinforce, human baselines are outperformed by deep learning models on certain well defined NLP tasks with increasing complexity year after year. Compositional nature of images made Convolution Neural Networks a huge success, whereas natural language differs from images as it not only has compositional dependencies but as well the sequential state. Recurrent Neural Networks and Long Short Term Memory (LSTM) networks outperformed the state of the art, and recently, the attention mechanism gave unprecedented success with novel Transformers.

The key success of Modern NLP is also attributed to self-supervised pre-training objectives to learn contextual embeddings and the ability to transfer learning to downstream task-specific models. Self-supervised pre-training objectives have relinquished the need for massive labeled data. On the other hand, transfer learning has relinquished the need for huge computational costs. As a result, we can see the exponential growth of complex models.

Fig 1. The exponential growth of NLP models complexity (Image: Turning-NLG [8])

So what?

  • Deep Learning has made feature engineering redundant and hence extinct!
  • Underlying representations of tokens became dense and complex
  • Internals of complex architectures of deep neural networks became difficult to understand.

As a result, we can not directly emphasize on how the decision is made, what features are important, or where the causation comes from? The success of modern NLP amplifies the challenges of interpretability.

Interpretability plays a key role in domain adoption as well as it builds confidence for real-world applications. We can cluster on-going research effort to interpret neural NLP models in the following questions:

  1. Is linguistic knowledge learned or ignored?
  2. Why does the model work the way it works?
  3. Can we explain model predictions?
  4. What makes NLP models vulnerable?
  5. How can Knowledge Graph advance modern NLP and its interpretability?

Let’s dive deep to understand what do we mean by each of these questions.

  • Linguistic Knowledge: Ignored or Learnt?
Fig 2. Linguistic Knowledge In a Sentence

Linguistics, the study of language and its structure, including the study of grammar, syntax, and phonetics, etc. It is intuitive to humans that the ability to understand, reason, and generate natural language would not be possible unless the system is able to learn linguistic components. In classical NLP, linguistic features like part-of-speech tagging (POS), named-entity-recognition, dependency tree, subject-verb agreements, coreference resolution, etc were derived using rule-driven or statistical learning approaches. Deep Neural Network models like RNNs, LSTMs, Transformers, etc do not need these hand-crafted features but are still able to outperform on certain well defined real-world tasks like classification, semantic analysis, question answering, summarization, text generation, etc. So, the question to be answered is “What (If at all) Linguistic Knowledge is Learned by Modern NLP Models”.

  • Why Does the Model Work the Way It Works?

Black box systems are good for modularity and integration, but the system needs to be transparent to analyze and improve. Transparency is a key pillar of Interpretability. “Model Understanding” is a niche area which deals with the internals of models. This requires a detailed analysis of what each layer of blocks in given DNNs learns, how they interact with each other, and hence contribute to the model decision.

Fig 3. How attention heads in different layers of the BERT model attend other tokens while processing a particular token.

Basically, how learning of a model can be attributed to its building blocks or underlying mechanisms? A deeper understanding of how the model works would facilitate interpretability, and open up opportunities to improve the system further. For instance, attention mechanism is a key idea to drive success home for state of the art LSTMs, or Transformers models. “How Attention Enables Learning in NLP Models?” (coming soon) would be interesting to study deeper.

  • Prediction may be Okay, Can we Explain It?
Fig 4. Explanation of Model Prediction (Image: Ribeiro et al [11])

Well, knowing what linguistic knowledge is learned by model, and how the underlying mechanism enables learning for these NLP models are building blocks towards NLP interpretability. It is of utmost importance to move “Towards Plausible and Faithful Explanations for NLP Models?(coming soon) This requires an in-depth study of how input tokens impact model decisions, so to attribute prediction back to tokens, and deriving token importance. And, how can we generate explanations from these important tokens? Are these generated explanations faithful? or, what’s the best way to generate a faithful explanation? Can these explanations play an active role in understanding the underlying robustness of a model? This is one of the active lines of research, where a lot of progress has been made recently.

  • On the Backdrop of Success, What makes Modern NLP models Vulnerable?

Modern NLP has made modest progress into real-world applications, i.e. conversational chatbots, real-time translations, automated question answering, hate speech, or fake news detection. Is it possible to hack these models for malicious intent, like legitimizing fake news, or stealing models without access to training data?

Fig 5: Adversarial Attack Example

A Transparent, Interpretable, and Explainable system would be better prepared to understand “The Challenges and Mitigations of Modern NLP Vulnerabilities” (coming soon). Where in, risks of adversarial attacks, underlying bias, unreliable evaluation criteria, and the possibility of extracting learned state of models can be understood and steps can be taken to mitigate such risks.

  • What about the Knowledge Graph? Can It Advance Modern NLP and Interpretability Further?

Traditionally, Knowledge Graph, structured information represented in the form of a graph, is at the heart of information retrieval based systems for domain-specific use cases. Mainly because Knowledge Graphs can be built deterministically by experts, easier to understand, seamless to integrate, effective for specific use cases, and straightforward to interpret. Hence, systems relying on knowledge graphs are easily adopted in different domains. Retrieval systems before the dawn of Modern NLP were mainly developed on top of Knowledge Graphs.

Fig 6. Knowledge Infused Learning (Image: Kurşuncu et al [5])

Self-supervised learning enables modern NLP to learn statistical patterns without worrying about experts’ intervention. These systems become scalable and powerful across varied and complex use-cases but may fail on very naive tasks, as simple facts are ignored because of a lack of statistical significance in the data. That’s where, if Knowledge Graphs can be integrated with modern NLP systems, it would bring the best of both the worlds to make systems comprehensive. Knowledge graphs can also align internal representations of features to make it more meaningful. “Knowledge Inception for Advanced and Interpretable NLP” (coming soon) would be an active area of research in the coming times.

Exploring limits of the modern NLP on the above dimensions gives a good understanding of why interpretability matters, what are the challenges, what is the progress made on them, and what questions still remain open? Although we have tried to be as broad as possible, by no means this is an exhaustive survey of the current state of NLP. It is intriguing to know how a modern NLP would become analyzable, transparent, robust, faithful, explainable, and secure in the coming times. On the other hand, it is equally fascinating to integrate KG and NLP, and which would not only help NLP interpretable but improve adoption into various domains such as education, healthcare, agriculture, etc.

I would like to acknowledge the efforts of all collaborators for publishing this article, specifically, reviews and feedback given by Prof. Amit Sheth, and the support of Aditi Avasthi.

References

[1] Manning CD. Computational Linguistics and Deep Learning, MIT Press 2015

[2] Norvig P. On Chomsky and the two cultures of statistical learning, Springer 2017

[3] Belkinov and Glass. Analysis Methods in Neural Language Processing: A Survey, MIT Press 2019

[4] Manning and Schutze. Foundations of statistical natural language processing, 1999

[5] Kurşuncu, Gaur, Sheth, Wickramarachchi and Yadav. Knowledge-infused Deep Learning, ACM 2020

[6] Arrieta et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Elsevier, 2020

[7] Rumelhart, Hinton and Williams. Learning representations by back-propagating errors, Nature 1986

[8] Turing-nlg: A 17-billion-parameter language model by microsoft, Microsoft Research Blog, 2020

[9] Zhang, Sheng, Alhazmi and Li. Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey, ACM 2020

[10] Clark, Khandelwal, Levy and Manning. What Does BERT Look At? An Analysis of BERT’s Attention, ACL Workshop BlackboxNLP 2019

[11] Ribeiro, Singh and Guestrin. “Why should I trust you?” Explaining the predictions of any classifier, ACM 2016

[12] Bender, Koller. “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, ACL 2020

--

--