BERT Classifier: Just Another Pytorch Model
Published in
9 min readJun 10, 2019
At the end of 2018 Google released BERT and it is essentially a 12 layer network which was trained on all of Wikipedia. The training protocol is interesting because unlike other recent language models BERT is trained in to take into account language context from both directions rather than just things to the left of the word. In pretraining BERT masks out random words in a given sentence and uses the rest of the…