Introduction to Sequence Modeling Problems

Published in

Towards Data Science

6 min readOct 30, 2019

Consider the problem of predicting the health risk of a person based on multiple health parameters and we have decided to model the true relationship between the input and output using Feed-forward neural networks (also known as Multi-layered Network of Neurons).

In Feed-forward Neural Networks (FNN) the output of one data point is completely independent of the previous input i.e… the health risk of the second person is not dependent on the health risk of the first person and so on.

Similarly, in the case of Convolution Neural Networks (CNN), the output from the softmax layer in the context of image classification is entirely independent of the previous input image.

The characteristics of FNN and CNN’s are:

Outputs are independent of previous inputs.
Input is of fixed length.

Citation Note: The content and the structure of this article is based my understand of the deep learning lectures from One-Fourth Labs — PadhAI.

Sequence Modeling

Sequence Modeling is the task of predicting what word/letter comes next. Unlike the FNN and CNN, in sequence modeling, the current output is dependent on the previous input and the length of the input is not fixed.

In this section, we will discuss some of the practical applications of sequence modeling.

Auto-Completion

Let’s look at the problem of auto-complete in the context of sequence modeling. In this problem, whenever we type a character (d) the system tries to predict the next possible character based on the previously typed character.

In other words, the network tries to predict the next character from the possible 26 English alphabets given that we have typed ‘d’. The neural network would have a softmax output of size 26 representing the probability of the next letter given the previous letters. Since the inputs to this network are characters we need to convert them to a one-hot encoded vector of size 26 and the element corresponding to the index of the alphabet would be set to 1, everything else is set to 0.

Parts of Speech Tagging

In the problem of parts of speech tagging, we are given a sequence of words for every word we need to predict the part of speech tag for that word (eg: verb, noun, pronoun, etc…). Again in this problem, the output is not only dependent on the current input (current word) but also on the previous input. For example, the probability of tagging the word ‘movie’ as a noun would be higher if we know that the previous word is an adjective.

Sequence Classification

Do we need to produce output at every time step?

Imagine that you want to predict the pulse of the movie by analyzing the reviews. In this scenario, we don’t need to output after every word of the input rather we just need to understand the mood after reading the entire sentence i.e…either positive or negative. Understanding the mood from a text with machine learning is called Sentiment analysis.

Modeling Sequence Learning Problems

In the previous section, we have learned about practical applications of sequence learning problems but how we model such problems?.

In sequence learning problems, we know that the true output at timestep ‘t’ is dependent on all the inputs that the model has seen up to the time step ‘t’. Since we don’t know the true relationship, we need to come up with an approximation such that the function would depend on all the previous inputs.

The key thing to note here is that the task is not changing for every timestep, whether we are predicting the next character or tagging the part of speech of a word. The input to the function is changing at every time step because for longer sentences the function needs to keep track of the larger set of words.

In other words, we need to define a function that has these characteristics:

Ensure that the output Yt is dependent on previous inputs
Ensure that the function can deal with a variable number of inputs
Ensure that the function executed at each time step is the same.

Recurrent Neural Networks

Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous step is fed as input to the current step.

In RNN, you can see that the output of the first time step is fed as input along with the original input to the next time step.

The input to the function is denoted in orange color and represented as an xᵢ. The weights associated with the input is denoted using a vector U and the hidden representation (sᵢ) of the word is computed as a function of the output of the previous time step and current input along with bias. The output of the hidden represented (sᵢ) is given by the following equation,

Once we compute the hidden representation of the input, the final output (yᵢ) from the network is a softmax function of hidden representation and weights associated with it along with the bias. We are able to come up with an approximate function that is able to satisfy all the three conditions that we have set to solve the problems of sequence learning.

Continue Learning

If you want to learn more about Artificial Neural Networks using Keras & Tensorflow 2.0 (Python or R). Check out the Artificial Neural Networks by Abhishek and Pukhraj from Starttechacademy. They explain the fundamentals of deep learning in a simplistic manner.

Conclusion

In this article, we have briefly looked at the limitations of Feed-Forward Neural Networks and Convolution Neural Networks to handle the tasks of varying input based on their characteristics. After that, we went on to discuss some of the practical applications of sequence modeling. We then looked at a new type of neural network that helps us in modeling the sequence learning problems.

Recommended Reading

Batch Normalization and Dropout in Neural Networks Explained with Pytorch

In this article, we will discuss the batch normalization and dropout in neural networks in a simple way.

towardsdatascience.com

Visualizing Convolution Neural Networks using Pytorch

Visualize CNN Filters and Perform Occlusion Experiments on Input

towardsdatascience.com

In my next post, we will discuss Recurrent Neural Networks and its learning algorithm in-depth. So make sure you follow me on Medium to get notified as soon as it drops.

Until then, Peace :)

NK.

Author Bio

Niranjan Kumar is Senior Consultant Data Science at Allstate India. He is passionate about deep learning and AI. Apart from writing on Medium, he also writes for Marktechpost.com as a freelance data science writer. Check out his articles here.

You can connect with him on LinkedIn or follow him on Twitter for updates about upcoming articles on deep learning and machine learning.

References:

Disclaimer — There might be some affiliate links in this post to relevant resources. You can purchase the bundle at the lowest price possible. I will receive a small commission if you purchase the course.