Multilingual Transformers
Why BERT is not the best choice for multilingual tasks
Published in
5 min readJan 17, 2020
Last year, we saw rapid improvements in transformer architectures. Being the GLUE benchmark the main reference point for the state-of-the-art in language understanding tasks, most of the research efforts focused on English data. BERT, RoBERTa, DistilBERT, XLNet — which one to use? provides an overview of recent transformer…