Multilingual Transformers

Why BERT is not the best choice for multilingual tasks

Simone Romano
Towards Data Science
5 min readJan 17, 2020

--

Image obtained translating “multilingual transformers” with https://translatr.varunmalhotra.xyz/ and using https://www.wordclouds.com/

Last year, we saw rapid improvements in transformer architectures. Being the GLUE benchmark the main reference point for the state-of-the-art in language understanding tasks, most of the research efforts focused on English data. BERT, RoBERTa, DistilBERT, XLNet — which one to use? provides an overview of recent transformer…

--

--