Unsupervised NER using BERT

Ajit Rajasekharan
Towards Data Science
21 min readFeb 28, 2020

--

Figure 1. illustrates tagged sentence samples of unsupervised NER performed using BERT (bert-large-cased) with no fine tuning. The examples highlight just a few entity types tagged by this approach. Tagging 500 sentences yielded about 1000 unique entity types — of which a select few were mapped to the synthetic labels shown above. The bert-large-cased model is unable to distinguish between GENE and PROTEIN because descriptors for these entities fall within the same tail of predicted distributions for masked terms (they are not distinguishable in the base vocabulary either). Distinguishing closely related entities like these may require MLM fine tuning on domain specific corpus or pre-training a model from scratch using a custom vocabulary (examined below)

TL;DR

An improved version of this approach published in Jan 2022 describes how to scale it to a large number of entity types (e.g. 68 entity types spanning the domain of biology and PHI entities such as person, location, organization).

In natural language processing, identifying entities of interest (NER) in a sentence such as person, location, organization etc. requires labeled data…

--

--