Longformer: The Long-Document Transformer
Understanding Transformer-Based Self-Supervised Architectures
Published in
7 min readDec 1, 2020
Transformer-based language models have been leading the NLP benchmarks lately. Models like BERT, RoBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot “attend” to longer sequences. For example, BERT is limited to a max of 512 tokens…