The world’s leading publication for data science, AI, and ML professionals.

Who’s Who and What’s What: Advances in Biomedical Named Entity Recognition (BioNER)

An overview of Named Entity Recognition research to help tackle the challenges related to the biomedical domain.

Photo by National Cancer Institute on Unsplash
Photo by National Cancer Institute on Unsplash

Thoughts and Theory

Introduction

At Slimmer AI, we have been exploring NER in the biomedical domain (BioNER). This exploration is important as there are a few challenges that do not always exist in other domains, including:

  • Data is often not freely available, especially in the clinical case.
  • Annotation of data requires expert knowledge.
  • The space of biomedical concepts is enormous, making it unlikely that NER systems will transfer beyond the specific settings for which they were annotated.

This article serves as an overview of Named Entity Recognition work and research to help tackle the challenges related to the biomedical domain, such as biomedical datasets and techniques to deal with them. I will also discuss the importance of transfer learning and multi-task learning in this domain. And I will quickly touch on some alternative methods for BioNER as compared to popular deep learning-based models like BERT.


A brief recap of NER development

Named entity recognition (NER) is one of the building blocks of NLP and is used in many downstream tasks like question answering, topic modelling and information retrieval. [1] The task involves tagging entities in unstructured text using categories such as person, organization, location, etc.

The goal of named entity recognition is to identify words as entities and classifying what entity type they belong to. Image by author, inspired by MonkeyLearn.
The goal of named entity recognition is to identify words as entities and classifying what entity type they belong to. Image by author, inspired by MonkeyLearn.

Early NER systems used handcrafted rules, lexicons, orthographic features and ontologies. [1,2] Models such as these have some benefits, such as the fact that they do not require annotated training data. However, these models have several downsides. For example, the lexicons need to be exhaustive, and the related dictionaries need to be proactively kept up to date by domain experts.

As this field progressed, people started using Machine Learning, but this method also had its drawbacks, such as laborious feature-engineering. Most recently, end-to-end deep learning methods have been introduced and has removed the need of manually engineering features for each specific dataset. These feature-inferring networks outperform the feature engineered systems, despite the lack of domain specific rules and knowledge. [1]


Dataset noise & bias

Many annotated datasets have been introduced into the biomedical domain with entity categories such as cell line, chemical, disease, gene, protein, and species. The majority of them use PubMed articles as their source, which has been annotated by several domain experts. For an overview of popular BioNER datasets, the review made by Flair or this overview on GitHub are good references.

One of most prominent problems with NER datasets, which is not restricted to the biomedical domain, is that datasets are imperfect. For example, most – if not all – suffer from annotator disagreement or missing annotations.

Li et al. highlighted data annotation quality as one of the major challenges in the field of NER. [2] To give a few examples, the creators of the MedMentions corpus assessed the annotation quality by having biologists review the annotations made by professional annotators and reported a 97.3% agreement. [3] The JNLPBA task from 2004 had a revised version in 2019, attempting to fix the imperfections in the original corpus. [4,5] There were two rounds of annotations, where the annotators discussed several disagreements in between rounds. In the first round there was 79.5% agreement in annotations, while in the second this was 91.4%. This is still quite far away from 100%. Wang et al. analyzed the widely adopted CoNLL03 NER dataset and were able to identify mistakes in about 5.38% test sentences. [6,7] They further advocate the significance of this given that the state-of-the-art test scores are already around 93%.

Depending on knowledge or perspective illustrations, or words, can be interpreted differently. This can lead to annotator disagreement and imperfect datasets. Image by Pixy.
Depending on knowledge or perspective illustrations, or words, can be interpreted differently. This can lead to annotator disagreement and imperfect datasets. Image by Pixy.

Spotting annotation mistakes requires deep domain expertise and a significant time investment. To deal with them in another way, Wang et al. introduced a framework called CrossWeigh. [6] Their solution is simple, but it does require considerable time to run. The idea is to train multiple models, each one on a different split of the train data. The splits, or folds, are set up in such a way that the train set does not contain any terms from the respective test set. Incorrect predictions are aggregated over all test sets and this process is repeated for a few iterations (given different random seeds to create the folds). The intuition is that if a term is consistently misclassified in every fold, the annotation is probably incorrect. These annotations are then suppressed by giving them smaller weight during training. Next to a 0.2–0.4% improvement in F1-score, they noticed their models getting more stable, given a lower standard deviation in F1-score over multiple runs.

As it is apparent in many machine learning tasks, datasets often contain bias. (Bio)NER is not an exception here. Identifying entity mentions that were seen during training (memorization) and dealing with morphological variations (synonym generalization) usually cause no problems for big models like (Bio)BERT. [8] However, Kim and Kang found that BioNER models often exploit dataset biases and fail to generalize to concepts that have not been seen during training. [8] They analyzed 50 classification mistakes in the BC5CDR dataset and found that BioBERT used statistical cues in 34% of these cases.

To explain what kind of cues they abuse, let us first quickly look at the most-used format used in NER datasets: the inside-outside-beginning annotation scheme) (IOB). When an entity consists of multiple (sub)words, for example, the organization ‘Slimmer AI’, the first word ‘Slimmer‘ is prefixed by B-, indicating the beginning of an entity, and ‘AI’ is prefixed with I-, indicating that a word is part of the preceding entity. Words not belonging to an entity are annotated with O. This format is useful to discern between consecutive entities in a text.

Example sentence annotated with the IOB-scheme. Slimmer AI is an org[anization] entity and consists of two parts: _Slimme_r, prefixed with B-, and AI, prefixed with I-. The other words aren't entities and are annotated with O. Image by author.
Example sentence annotated with the IOB-scheme. Slimmer AI is an org[anization] entity and consists of two parts: _Slimme_r, prefixed with B-, and AI, prefixed with I-. The other words aren’t entities and are annotated with O. Image by author.

Kim and Kang found that some words only occur with B- in the training set and therefore were always classified as B- in the test set, resulting in incorrect predictions. [8] To tackle this bias, they propose a de-biasing procedure based on the bias product method. [9] As a result, the models did take a small hit on memorization performance, but they improved in the areas of synonym and concept generalization. Their debiasing method might, however, reduced the opportunity to use valid patterns in entities. Examples of such patterns are " street" and " disease", which are very relevant patterns to utilize in order to boost concept generalization. The authors note this as future work.

Another big problem in the biomedical domain is that it is often difficult to obtain large datasets annotated by experts. If you do happen to find such a gold-standard corpus, they are typically small in size. To tackle the problem of small datasets you can utilize transfer learning, multi-task learning or few-shot learning.


Transfer & few-shot learning

Using pre-trained models and applying transfer learning techniques is common practice in the NLP community nowadays and can boost performance on downstream tasks significantly. Applying it to the BioNER task is no exception as several models trained on large amounts of biomedical data have been made open-source and are frequently used. One of the most prominent models is BioBERT, pre-trained first on general domain corpora and subsequently on biomedical domain corpora like PubMed. [10] To illustrate why open sourcing these models is so important, only the latter part of pre-training on the biomedical domain took a whopping 23 days using 8 Nvidia V100 GPUs. That is quite the investment to make for small and medium-sized businesses. Another model that has had remarkable success is the HunFlair model, trained on 23 biomedical datasets, outperforming BioNER models like SciSpaCy and HUNER by a big margin. [11,12,13] The evidence that using domain specific models over the vanilla variant improves performance on downstream tasks is simply overwhelming. [8,10,11,14–18]

High-level overview showing the concept of transfer learning. Part of the model, trained on one dataset, is transferred and further fine-tuned on another task. Image by author, inspired by TopBots.
High-level overview showing the concept of transfer learning. Part of the model, trained on one dataset, is transferred and further fine-tuned on another task. Image by author, inspired by TopBots.

The HunFlair model made use of pre-training and gained 0.80–4.75 percent on F1 scores for different classes compared to a randomly initialized LSTM that used embeddings pre-trained on general corpora. [11] The observed increase was primarily caused by a higher recall. Their model and training data are available on their GitHub page and, as it is part of the popular Flair Python library, you can easily extend this model to your own liking. [19]

Peng et al. introduced the Biomedical Language Understanding Evaluation (BLUE) benchmark, of which BioNER is one of the criteria, to facilitate research in the development of pre-trained models (for more information on BLUE check out their respective GitHub page). [16] They evaluated several BERT and ELMo baselines and found that a BERT model pre-trained on PubMed abstracts and MIMIC-III clinical notes achieved the best results. [20] Their model, aptly named BlueBERT, outperformed ELMo, BioBERT, and other state-of-the-art models on all datasets. The best model setting improved on BioBERT as much as 8.9 F1-score for a single dataset and 1.8 on average.

Giorgi and Bader utilized silver-standard corpora, corpora that tend to be much larger but have lower quality. [17] Such a corpus could be generated by using existing NER models to annotate a large, unlabeled corpus. The authors achieved a reduction in error of 11% across several datasets by first training on these silver-standard corpora and then fine-tuning on a gold-standard corpus. Improvements were particularly large for datasets with few annotations (< 6000).

With pre-training also comes the opportunity to apply techniques like few-shot learning: Fine-tuning your model using only a handful of labels. This is a super useful technique, especially in domains where annotated data is scarce. Hofer et al. evaluated five techniques for dealing with such small datasets in the medical domain going as far as to only use 10 train samples. [21] As a baseline, they used a model based on the state-of-the-art model tuned for CoNLL-2003 and OntoNotes 5.0 at that time. [7,22] One of the techniques to improve their few-shot learning task involved pre-training on a related dataset with many more samples. After testing on several datasets, they improved the F1-score in their best setting by 4.52% compared to the baseline. Using a dataset within the same domain clearly outperformed using one from another domain.

A different setting utilized multiple individual datasets and pre-trained on all of them combined. Interestingly, the authors reported a negative impact of -1.66% compared to using a single dataset for pre-training. The authors note that it could be caused by their training strategy as they used hyperparameters that were optimized for separate datasets. They further argue that the weights obtained by pre-training on the first dataset might have been unsuitable for the second. The last technique worth noting used word embeddings specifically trained on biomedical text as input, as compared to using vanilla GLoVE embeddings. [23] This improved their F1-score by an additional 3.78%. All techniques combined raised their F1-score from 69.30% to 78.87%, which is a nice improvement.


Multi-task learning

In addition to applying transfer learning, another popular method is to use multi-task learning: Training not only the task at hand but including other related tasks to boost performance of the main task. The concept behind this is that similar tasks or datasets have semantic and syntactic similarities which can help in training a more optimized model for the task at hand. Additionally, it can reduce the chance of overfitting. Popular tasks to include are tagging part-of-speech (POS) labels, syntactic constituents, and dependency relations. [2,14] Furthermore, data for these tasks is relatively easy to obtain, as syntactic features can be obtained using off-the-shelf toolkits like NLTK or Stanford CoreNLP.

An example architecture for applying multi-task learning. The first layers are shared amongst different, but related, tasks. Image by author, inspired by Ruder.io.
An example architecture for applying multi-task learning. The first layers are shared amongst different, but related, tasks. Image by author, inspired by Ruder.io.

Tian et al. attempted to incorporate syntactic features into a BioBERT model using a key-value memory network and saw their performance increase with about 0.2–0.9% F1. [14] This might not sound like much. However, when considering that some models on some datasets already perform in the 90%+ range, such an increase could be significant.

Wang et al. viewed training using multiple datasets with different entity types as a multi-task problem. [18] They trained different BioNER models by iteratively going through the datasets, while sharing some of the parameters across these models. They outperform the previous state-of-the-art on 14 out of 15 datasets with improvements ranging from 0.2 to 1.8% F1-score.

Khan et al. applied a similar approach, but they viewed their model as a single model with shared lower layers and specific top layers for each task. [15] For the lower layers they used a pretrained BioBERT model. Training on three datasets yielded the best performance, improving the F1-score by 0.2–1.3% compared to a single-task learning model and further improving on the model of Wang et al. by 0.7–2.3% F1.

A unique way of applying multi-task learning is to have multiple single models collaborate. CollaboNet is a framework that applies this trick and uses specific models trained on different datasets for different tasks. [24] The authors note that regular multi-task learning yield models which score well on recall but have lower precision. Because those models are trained on several different entity types, they tend to have difficulty in predicting the correct entity type. Additionally, the authors identify an issue in the biomedical domain where entities can be labeled as different entity types, depending on context. Some words can denote a gene in one context and disease in another.

With CollaboNet they tried to solve for this by having expert models for each entity type that collaborate. During training, these models take turns in being updated or functioning as collaborator. Each model receives the output of the collaborator models and uses that as auxiliary inputs. As a result, each model is considered an expert in its own domain, while improving the performance of other models by leveraging multi-domain information. Compared to Wang et al. CollaboNet achieved an increase between 0.2–5.0% F1-score. [18] It’s good to note that CollaboNet, which uses a BiLSTM-CRF architecture, is already outperformed by BioBERT. However, the framework itself could still be applied to more state-of-the-art models, improving their scores even more. One will need ample memory and time to be able to apply it, however.

One mitigation to the problem of memory and time is to use smaller networks, or to use networks which utilize weight sharing. One such promising model is BioALBERT based on (you guessed it): ALBERT. [25,26] Apart from their parameter reduction techniques to speed up the model, the authors trained their model not only on the task of NER, but also applied sentence-order-prediction. This technique takes two consecutive sentences and two random sentences from the training data and tries to predict if one follows the other. This allows the model to better learn context-dependent representations.

BioALBERT consistently outperforms BioBERT on several datasets, often by a big margin, and their reported scores are quite impressive. And with the parameter reduction, the train time is about 2–3x lower: A huge win! The authors have made their model publicly available on Github.


Knowledge based NER and ontologies

When you do not have a lot of labeled data to your disposal, due to the costly and labor-intensive annotation process, there may be another way of training your model: Knowledge based NER. Knowledge based NER models classify based on correspondence with an ontology vs. annotations.

Photo by Devon Divine on Unsplash
Photo by Devon Divine on Unsplash

The challenging MedMentions dataset has based their annotations on the extensive UMLS ontology. [3] This rich ontology, containing millions of concepts, is used in several studies that focus on creating BioNER systems. The MedMentions dataset is ‘only’ annotated with 352k mentions of UMLS concepts, but it is still one of the most challenging datasets out there. The high number of concepts makes it difficult for any model to memorize or to generalize. It is therefore not surprising that the best performing BERT-based models on this dataset achieves a test score of around 56% F1. [27] A follow-up blog post by my colleague Stephan Tulkens will attempt to tackle the MedMentions dataset using unsupervised methods, so stay tuned!

Back to Knowledge based NER: In 2013, Zhang and Elhadad created signature vectors for each occurring UMLS entity type by averaging the TF-IDF vectors of each occurrence in the corpus. [28] These vectors are then compared to the vectors of all noun-phrase chunks to identify all entities. It outperformed the traditional dictionary-based methods but fell short of the more recent supervised methods.

Ghiasvand and Kate outperformed Zhang and Elhadad by making several improvements. [28,29] They only use unambiguous UMLS terms as their seed terms to generate both positive and negative examples. Syntactic and semantic features of these examples were used to train an ensemble of decision trees. The output of this model was then used to extend the examples to also include ambiguous UMLS terms. This cycle was repeated several times. This method outperformed other unsupervised methods, and on some entity classes, it performed comparable to supervised systems that used manual annotations.

Similarly, De Vine et al. learned concept embeddings from free text notes by first extracting concepts using a matcher included with UMLS. [30] They then replaced any spans in the text with the ID of that concept and then learned concept embeddings by applying skip-gram (e.g., word2vec). These concepts were evaluated by correlating the cosine distances between concepts with human judgments on two small datasets and achieved positive results.

Beam et al. applied a similar approach and later made their embeddings publicly available. [31] Phan et al. learned name embeddings by using a kind of ‘Siamese’ network: They used pre-trained word embeddings and character embeddings as an input to a BiLSTM. [32] The output of the BiLSTM was then used to calculate three losses: one that penalizes distance between synonyms (i.e., synonyms should be close together), one that penalized distance between the name and the concept (i.e., the concept represented by a name), and one that penalizes the distance to the "local context" (i.e., the average of word embeddings in which a concept occurs.). Their model outperformed other baselines on many retrieval, similarity, and relatedness tasks.

Researchers at Facebook AI proposed the unsupervised knowledge-augmented language model (KALM) which augments a traditional language model with a knowledge base, trained end-to-end to optimize the perplexity. [33] During training, it uses a gating mechanism to control whether a word is a general word or if it should be modeled as a reference to an entity, given the context observed so far and a knowledge base as input. It subsequently uses this gating mechanism during prediction time to predict whether a word is an entity. It does not require any additional information such as labeled named entity tags in a text corpus and still achieves performance comparable with state-of-the-art supervised models.

Karadeniz and Özgür tackle the problem of entity normalization: Mapping entities to an ontology/dictionary, which is necessary to make sense of the identified entities. [34] This is not a trivial problem to solve in the biomedical domain for several reasons. As previously mentioned, there is often an ambiguity problem – where entities may have a different semantic meaning based on context. Further, there is also the challenge of identifying concepts which occur in varying surface forms in a text (e.g., present and past tense, or abbreviations).

The authors utilize both semantic and syntactic information in an unsupervised way, by using pre-trained word embeddings with syntactic parsing information. They achieve a new state-of-the-art precision score, beating the previous state-of-the-art with 2.9 percentage points.

But what do you do when the entities that you want to extract are not explicitly mentioned in the text, but are rather; implicit? To give you an example, suppose that a sentence contains the word hydrolyzed. The words water or H2O are not mentioned explicitly, but you could infer that water is involved in the process.

Shoshan and Radinsky coined the task Latent Entity Extraction (LEE), where you try to identify these implicit entities. [35] In their research the authors focused on the biochemical reactions domain using the Reactome ontology. They trained several one-versus-all classifiers using pre-trained word embeddings and a BiLSTM classifier on top, one classifier for each type of entity. Multi-task learning was applied here as well to not only train each classifier on its designated entity type, but also on other related types. The authors showed that their model reaches high performance in identifying these latent entities. The authors conclude that the LEE task will significantly improve many NER systems and applications built on top of them.


Concluding remarks

The challenges in biomedical Named Entity Recognition are not to be underestimated. Fortunately for all of us actively applying AI in software product development, there are many bright minds helping us tackle this complicated field.

I am encouraged by the numerous ‘tricks’ that we can employ to get further still. For example, you could use the CrossWeigh framework to deal with noisy annotations, identify and debias your data, or use silver-standard corpora to pre-train your model. And you most certainly should utilize transfer learning and multi-task learning on related tasks if you have the chance to do so.

There are many pre-trained models available, and you can choose from a selection of multi-task learning schemes. And, if you do not have annotated data available, I recommend turning to knowledge-based systems which leverage ontologies or any other kind of knowledge base. Lastly, consider whether there could be implicit entities in your texts. If so, be sure not to neglect them and attempt to extract them anyway.

Final note: I am certain we have missed some interesting and promising techniques available in the literature. If you know of any, please share them with the rest of the community by leaving a reply in the comments (you are awesome). And good luck on your BioNER journey!

Thanks to my colleagues Michiel van de Steeg and Stephan Tulkens for contributing to the research.


References

[1] V. Yadav and S. Bethard, A Survey on Recent Advances in Named Entity Recognition from Deep Learning models (2018), Proceedings of the 27th International Conference on Computational Linguistics.

[2] J. Li, A. Sun, J. Han and C. Li, A Survey on Deep Learning for Named Entity Recognition (2020), IEEE Transactions on Knowledge and Data Engineering.

[3] S. Mohan and D. Li, MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts (2019), Proceedings of the 2019 Conference on Automated Knowledge Base Construction.

[4] J.D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi and N. Collier, Introduction to the Bio-entity Recognition Task at JNLPBA (2004), Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications.

[5] M. Huang, P. Lai, R.T. Tsai and W. Hsu, Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task (2019), arXiv:1901.10219 [cs.IR].

[6] Z. Wang, J. Shang, L. Liu, L. Lu, J. Liu, and J. Han, CrossWeigh: Training Named Entity Tagger from Imperfect Annotations (2019), arXiv:1909.01441 [cs.CL].

[7] E.F. Tjong Kim Sang and F. De Meulder, Introduction to the CoNLL-2003 shared task: language-independent named entity recognition (2003), Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003.

[8] H. Kim and J. Kang, How Do Your Biomedical Named Entity Models Generalize to Novel Entities? (2021), arXiv:2101.00160 [cs.CL].

[9] C. Clark, M. Yatskar and L. Zettlemoyer, Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases (2019), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.

[10] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. Ho So and J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining (2020), Bioinformatics.

[11] L. Weber, M. Sänger, J. Münchmeyer, M. Habibi, U. Leser and A. Akbik, HunFlair: An Easy-to-Use Tool for State-of-the-Art Biomedical Named Entity Recognition (2021), Bioinformatics.

[12] M. Neumann, D. King, I. Beltagy and W. Ammar, ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing (2019), arXiv:1902.07669 [cs.CL].

[13] L. Weber, J. Münchmeyer, T. Rocktäschel, M. Habibi, U. Leser, HUNER: improving biomedical NER with pretraining (2019), Bioinformatics.

[14] Y. Tian, W. Shen, Y. Song, F. Xia, M. He and K. Li, Improving Biomedical Named Entity Recognition with Syntactic Information (2020), BMC Bioinformatics.

[15] M.R. Khan, M. Ziyadi and M. Abdelhady, MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers (2020), arXiv:2001.08904 [cs.CL].

[16] Y. Peng, S. Yan, and Z. Lu, Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (2019), Proceedings of the 18th BioNLP Workshop and Shared Task.

[17] J.M. Giorgi and G.D. Bader, Transfer Learning for Named-Entity Recognition with Neural Networks (2018), Proceedings of the Eleventh International Conference on Language Resources and Evaluation.

[18] X. Wang, Y. Zhang, X. Ren, Y. Zhang, M. Zitnik, J. Shang, C. Langlotz and J. Han, Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning (2018), Bioinformatics.

[19] A. Akbik, D.A. Blythe and R. Vollgraf, Contextual String Embeddings for Sequence Labeling (2018), Proceedings of the 27th International Conference on Computational Linguistics.

[20] A.E.W. Johnson, T.J. Pollard, L. Shen, L.H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L.A. Celi and R.G. Mark, MIMIC-III, a freely accessible critical care database (2016), Scientific Data.

[21] M. Hofer, A. Kormilitzin, P. Goldberg and A. Nevado-Holgado, Few-shot Learning for Named Entity Recognition in Medical Text (2018), arXiv:1811.05468 [cs.CL].

[22] E. Hovy, M. Marcus, M. Palmer, L. Ramshaw and R Weischedel, OntoNotes: the 90% solution (2006), Proceedings of the Human Language Technology Conference of the NAACL.

[23] J. Pennington, R. Socher and C.D. Manning, GloVe: Global Vectors for Word Representation (2014), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.

[24] W. Yoon, C. So and J. Lee, CollaboNet: collaboration of deep neural networks for biomedical named entity recognition (2019), BMC Bioinformatics.

[25] U. Naseem, M. Khushi, V. Reddy, S. Rajendran, I. Razzak and J, Kim, BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition (2020), arXiv:2009.09223 [cs.CL].

[26] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut, ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (2019), arXiv:1909.11942 [cs.CL].

[27] K.C. Fraser, I. Nejadgholi, B. Bruijn, M. Li, A. LaPlante and K.Z. Abidine, Extracting UMLS Concepts from Medical Text Using General and Domain-Specific Deep Learning Models (2019), arXiv:1910.01274 [cs.CL].

[28] S. Zhang and N. Elhadad, Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts (2013), Journal of Biomedical Informatics.

[29] O. Ghiasvand and R.J. Kate, Learning for clinical named entity recognition without manual annotations (2018), Informatics in Medicine Unlocked.

[30] L. De Vine, G. Zuccon, B. Koopman, L. Sitbon and P. Bruza, Medical Semantic Similarity with a Neural Language Model (2014), Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.

[31] A.L. Beam, B. Kompa, A. Schmaltz, I. Fried, G. Weber, N. Palmer, X. Shi, T. Cai and I.S. Kohane, Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data (2018), Pacific Symposium on Biocomputing.

[32] M.C. Phan, A. Sun and Y. Tay, Robust Representation Learning of Biomedical Names (2019), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

[33] A. Liu, J. Du and V. Stoyanov, Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition (2019), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[34] İ. Karadeniz and A. Özgür, Linking entities through an ontology using word embeddings and syntactic re-ranking (2019), BMC Bioinformatics.

[35] E. Shoshan and K. Radinsky, Latent Entities Extraction: How to Extract Entities that Do Not Appear in the Text? (2018), Proceedings of the 22nd Conference on Computational Natural Language Learning.


Related Articles