A review of BERT based models
Also: some recent clues/insights into what makes BERT so effective
Published in
11 min readJun 17, 2019
Attention — the simple idea of focussing on salient parts of input by taking a weighted average of them, has proven to be the key factor in a wide class of neural net models. Multihead attention in particular has proven to be reason for the success of state-of-art natural language processing models such…