From 94% to 95%

I will share the various techniques; theoretically, I used to improve my NLP sentiment analysis on the IMDB dataset from 94% to 95% approx.

Published in

Towards Data Science

4 min readOct 25, 2019

Semi-Supervised learning — There is a branch of learning known as Semi-Supervised learning which allows us to use unlabelled data to train our language modal. IMDB dataset provided by the fastai offers us with the unlabelled data under the upsup folder, which we can use to train our model. Most of the companies spend years to label the data entirely, but semi-supervised learning allows us to train the modal without marking the complete data. We can train our language learner about the specifics of any language using the unlabelled data as well. So, use unlabelled data also whenever you find one.

Since we are using unlabelled data too, we have more data and, therefore, the number of epochs.
Mixed precision — training instead of using single precision floats, use half-precision floats. Earlier, 32 bits were required to do float operations because then only we can achieve accuracy. But, for deep learning models, we want approx. Some GPUs support half-precision floating points. NVIDIA in their GPUs and Google in TPUs added software to support half-precision floating points, and it speeds up the results. Though 16-bit floating points are inaccurate, at some places in the neural networks, it works. But in certain areas like calculating the gradients and loss, we need single-precision floating numbers because half-precision floating numbers turned down to 0. So, in practice, we use mixed-precision, where our modal uses single and half-precision depending upon the type of operation. That’s a tough choice. But NO PROBLEM. Fastai is fortunate to support this internally. But definitely, we can define it explicitly, also like below. Thanks to Fastai.

drop_mult — The language model that we are using for the transfer learning uses AWD_LSTM, which is a type of Recurrent Neural Network, and it defines 5 different types of dropouts. And it becomes very annoying to declare every type of dropout and experiment with the values. Fastai has stated the base values for the dropouts, and what we provide is drop_mul, which multiples the number with every dropout internally. If the modal is badly overfitted, you can increase the value, i.e. more regularization. We aim to create the more accurate language modal, and generally, we define lower values of dropout because higher values of regularization lead to poor language modal, thus the poor prediction of the next word.

But here is a catch

In the recent research by Fastiers, If we increase the value of regularization, we get poor language modal, but we certainly get a more accurate and better Sentiment classifier Modal. Let us wait for the research to get published. Generally, values are between 0.1 to 1.

Backward — predict the previous word given the last words, something unusual but helpful. Fastai allows us to train the modal backward so that our learner gets to predict the word in a backward way also. Then we can ensemble both the upward classifier build on the forward learner, and backward classifier builds on the backward learner, which increases the accuracy. What we do is create the database which is reverse of the actual data and then train the language learner based on that.

I will keep on sharing more techniques if I find any. Till then, keep exploring the fastai.