A no-frills guide to most Natural Language Processing Models — The Transformer (XL) Era

From the Universal Sentence Encoder to Open-GPT2, (AL)BERT, XLNET & Turing-NLG

Published in

Towards Data Science

7 min readMar 11, 2020

LSTMs were immensely popular but they also had a lot of constraints. They are computation-heavy and tend to have difficulties maintaining long-term dependencies (despite their name). In 2018, Google published a paper “Attention Is All You Need” that introduced transformers, showing that we could overcome a lot of the flaws of Recurrent Neural Networks and…

A no-frills guide to most Natural Language Processing Models — The Transformer (XL) Era

From the Universal Sentence Encoder to Open-GPT2, (AL)BERT, XLNET & Turing-NLG

Written by Ilias Miraoui