Maximizing BERT model performance

An approach to evaluate a pre-trained BERT model to increase performance

Ajit Rajasekharan

Published in

Towards Data Science

14 min readNov 4, 2020

**Figure 1. Training pathways to maximize BERT model performance**. For application domains where entity types — people, location, organization etc. are the dominant entity types, training pathways 1a-1d would suffice. That is, we start off with a publicly released BERT model (bert-base/large-cased/uncased, or the tiny bert versions) and optionally train it further (1c — continual pre-training) before fine-tuning it for a specific task (1d — supervised task with labeled data). For a domain where person, location, organization etc. are not the dominant entity types, use of original BERT model for continual pre-training (1c) with a domain specific corpus, followed by fine tuning may not boost performance as much as pathway 2a-2d, given the vocabulary in the 1a-1d pathway is still the original BERT model vocabulary with an entity bias towards people, location organization etc. Pathway 2a-2d trains a BERT model from scratch using a vocabulary that is generated from the domain specific corpus. **Note**: Any form of model training - pre-training, continual pre-training or fine tuning, modifies both model weights as well as the vocabulary vectors — the different shades of same color model(shades of beige)as well as vocabulary(shades of blue/green) in the training stages from left to right illustrates this fact. The box labeled with a “?”, is the focus of this article — evaluate a pre-trained or a continually pre-trained model to improve model performance. **Image by Author**

TL;DR

Training a BERT model from scratch on a domain specific corpus such as biomedical space with a custom vocabulary generated specific to that space has proven to be critical to maximize model performance in biomedical domain. This is largely because of…

Maximizing BERT model performance

An approach to evaluate a pre-trained BERT model to increase performance

TL;DR

Written by Ajit Rajasekharan