Longformer: The Long-Document Transformer

Understanding Transformer-Based Self-Supervised Architectures

Published in

Towards Data Science

7 min readDec 1, 2020

Transformer-based language models have been leading the NLP benchmarks lately. Models like BERT, RoBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot “attend” to longer sequences. For example, BERT is limited to a max of 512 tokens…

Longformer: The Long-Document Transformer

Understanding Transformer-Based Self-Supervised Architectures

Written by Rohan Jagtap