What is Two-Stream Self-Attention in XLNet
Understand the Two-Stream Self-Attention in XLNet intuitively
Published in
8 min readJul 29, 2019
In my previous post What is XLNet and why it outperforms BERT, I mainly talked about the difference between XLNet (AR language model) and BERT (AE language model) and the Permutation Language Modeling.