Natural Language Processing

Introduction
Deep Learning; The solution to the problems of mankind. Over the past few years, Deep Learning has advanced humanity in novel ways. One of these beneficiaries is the entire field of Natural Language Processing (NLP). But before we get into how, let’s explore the field of Deep Learning.
I’m pretty sure we’ve all seen the Venn diagram that illustrates the relationship of Artificial Intelligence (AI) to Machine Learning (ML) to deep learning. If not, it simply demonstrates that Deep Learning is a subfield of Machine Learning which is a subfield of Artificial Intelligence.
How Deep Learning differs from Machine Learning is that it’s concerned with algorithms that are developed based on the structure and function of the human brain – These structures and functions are called Artificial Neural Networks.
How Deep Learning is Advancing NLP
Much of the software industry is moving towards machine intelligence or data-driven decisions, but other industries have noted its effect and are taking an interest, like in the healthcare industry. A good cause of the growth of AI boils down to Machine Learning and Deep Learning (and of course other factors which make performing these tasks possible), however, in recent years Deep Learning has been gaining much more popularity as a result of its supremacy in relation to metrics such as accuracy, and when we have a lot of data to feed our Deep Learning algorithm.
For instance, in Text Classification tasks, Recurrent Neural Network (RNN) based models have surpassed the performance levels of the standard Machine Learning techniques that were once solved with revolutionary Machine Learning models such as the Naive Bayes classifier and Support Vector Machines. Additionally, LSTMs (Which is part of the family of RNNs) have outperformed Conditional Random Field (CRF) models in sequence-labeling tasks such as entity extraction.
Off late, there is a new talk of the town around called Transformer models. Transformer Models are extremely powerful and have become the state-of-the-art models for most Natural Language Processing tasks. Going into the details of the transformer model is beyond the scope of this article but leave a response if you want me to cover it in a later tutorial.
The Downfalls
Despite the monstrous success of Deep Learning, it’s still not yet the silver bullet for every NLP task. Therefore, practitioners shouldn’t rush to build the biggest RNN or transformer when faced with a problem in NLP, and here is why:
Overfitting
Deep Learning models tend to have much more parameters than what you’d expect from a traditional Machine Learning model. As a result, Deep Learning models possess much more expressivity than traditional models, hence what makes them so powerful. What makes it so powerful is also the source of its greatest weakness. Many Deep Learning models tend to overfit small datasets, therefore leading to poor generalization capabilities and this results in a horrible performance in production.
Lack of Few-shot Learning Techniques
I’m not usually one to compare, but in this instance, we have to; Take a look at the Computer Vision field. Deep Learning has made great advances in CV because of techniques such as few-shot learning – for example, learning from very few instances – which has led to much wider adoption of solving real-world problems with Deep Learning. In NLP, we just haven’t seen similar Deep Learning techniques be adopted successfully in the same way it has for Computer vision.
Domain Adaptation
Transfer Learning has been revolutionary for improving the performance of Deep Learning models when we don’t have lots of training data. However, utilizing a Deep Learning model that was trained on a common domain, say for instance newspaper articles, then applying it to a domain that’s much more recent, like social media posts, may result in overall poor performance of our Deep Learning model. A solution that’s much simpler, but still very effective may involve traditional Machine Learning models or domain-specific rule-based models.
Interpretability
Interpretability is something that’s risen to prominence in recent years – it makes sense. Wouldn’t you want to know why you were rejected for a loan for instance? Though there are techniques being developed to attempt to interpret Deep Learning models, for the most part, they work like a black box. In situations like this, it may be more useful to use a technique like a Naive Bayes model which makes it easy to explain why a certain classification was made.
Cost
Cost is important; Building a solution with Deep Learning has the potential to be pretty expensive with respect to both time and money. Unlike on Kaggle, Datasets don’t come labeled in the real world, and don’t forget we’d need a large dataset for our model to not overfit. Collecting and labeling a large dataset could be very time-intensive and then going on to deploy the model and maintain it in production may be expensive in terms of hardware requirements and effort.
Final Thoughts
In no way is this an exhaustive list of why Deep Learning may not be the silver bullet for NLP tasks just yet, but it provides a real sense of the types of scenarios that could play out in a real-world project. These symptoms often lead to a project cycle that’s much longer than it needs to be, talk less of the higher costs of getting the project over the line and maintaining it once it’s been delivered. Additionally, the performance advancement oftentimes is not very significant in relation to traditional Machine Learning models (i.e. not a great increase in accuracy if there is any).
Thank you for reading! Connect with me on LinkedIn and Twitter to stay up to date with my posts about Data Science, Artificial Intelligence, and Freelancing.
Related Articles
Combating Overfitting in Deep Learning