A review of BERT based models

Also: some recent clues/insights into what makes BERT so effective

Ajit Rajasekharan
Towards Data Science
11 min readJun 17, 2019

--

Image source for the BERT & Ernie figures

Attention — the simple idea of focussing on salient parts of input by taking a weighted average of them, has proven to be the key factor in a wide class of neural net models. Multihead attention in particular has proven to be reason for the success of state-of-art natural language processing models such…

--

--