To Distil or Not To Distil: BERT, RoBERTa, and XLNet

Transformers are the undisputed kings of Natural Language Processing. But with so many different models around it can be tough to choose just one. Hopefully, this will help!

Thilina Rajapakse
Towards Data Science
8 min readFeb 7, 2020

--

This is becoming a bit of a cliche, but Transformer models have transformed Natural…

--

--