What exactly happens when we fine-tune BERT?

A closer look into some of the recent BERTology research

Published in

Towards Data Science

6 min readFeb 21, 2022

Illustration of the pre-training / fine-tuning approach. 3 different downstream NLP tasks, MNLI, NER, and SQuAD, are all solved with the same pre-trained language model, by fine-tuning on the specific task. Image credit: Devlin et al 2019.

Google’s BERT was a paradigm shift in natural language modeling, in particular because of the introduction of the pre-training / fine-tuning paradigm: after pre-training in an unsupervised way on a massive amount of text data, the model can be rapidly fine-tuned on a specific downstream task…

What exactly happens when we fine-tune BERT?

A closer look into some of the recent BERTology research

Written by Samuel Flender