Deep Learning

Beautifully Illustrated: NLP Models from RNN to Transformer

Explaining their complex mathematical formula with working diagrams

Albers Uzila
Towards Data Science
12 min readOct 11, 2022

Photo by Rubaitul Azad on Unsplash
Table of Contents· Recurrent Neural Networks (RNN)
Vanilla RNN
Long Short-term Memory (LSTM)
Gated Recurrent Unit (GRU)
· RNN Architectures· Attention
Seq2seq with Attention
Self-attention
Multi-head Attention
· Transformer
Step 1. Adding Positional Encoding to Word Embeddings
Step 2. Encoder: Multi-head Attention and Feed Forward
Step 3. Decoder: (Masked) Multi-head Attention and Feed Forward
Step 4. Classifier
· Wrapping Up

Natural Language Processing (NLP) is a challenging problem in deep learning since computers don’t understand what to do with raw words. To use computer power, we need to convert words to vectors before feeding them into a model. The resulting vectors are called word embeddings.

Those embeddings can be used to solve the desired task, such as sentiment classification, text generation, name entity recognition, or machine translation. They are processed in a clever way such that the performance of the model for some tasks becomes on par with that of…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Responses (8)

Recommended from Medium

Lists

See more recommendations