The world’s leading publication for data science, AI, and ML professionals.

Effortless NLP using pre-trained Hugging Face pipelines (with just 3 lines of code!)

In this post, I will show you how you can use the BERT model with just 3 lines of code using the Hugging Face Transformers library!

Effortless NLP using pre-trained Hugging Face pipelines

Learn how to do natural language processing with just 3 lines of code

Photo by Tengyart on Unsplash
Photo by Tengyart on Unsplash

Recently, the Bert model has gained popularity in the language processing space, since it manages to combine state-of-the-art performance with limited computing power. In this article, I will show you how you can use the model yourself with just 3 lines of code using the Hugging Face Transformers library! But first, let’s see how the BERT model works.


What is BERT?

BERT stands for Bidirectional Encoder Representations from Transformers. It is a recent language model that manages to obtain cutting-edge results in a wide range of NLP task.

One of the main advantages of BERT is that it is bidirectional, which means that the model considers an entire sequence of words at once. In contrast to a left-to-right approach, this allows BERT to use all surrounding words (on both the left and right side) to contextualize each word.

Two stages of the BERT model. Image from the original research paper
Two stages of the BERT model. Image from the original research paper

Furthermore, you can use the BERT model with limited computational power, because it makes use of transfer learning: first a model is trained on some general task (pre-training), and then the acquired knowledge is ‘transferred’ to a related NLP task (fine-tuning). Let’s have a look at these two steps in more detail.

Pre-training

First, the model is pre-trained on a large, plain text corpus like Wikipedia. Pre-training should be generic, in order to use the model for a wide range of objectives later on. Next to that, the pre-training is done self-supervised, such that the input does not need to be labeled, which in turn means we have an almost infinite supply of training data. The pre-training of the BERT model is done on two tasks:

  • Masked Language Modeling (MLM): 15% of the words from the corpus are masked, and the objective is to predict the masked works. For example, a masked sentence might be Paris is the [MASK] of France, and the model would try to predict capital.
  • Next Sentence Prediction (NSP): two random sentences from the corpus are combined. The objective is to predict whether or not these two sentences also appeared next to each other in the original corpus. For example, the two sentences could be the man went to the store and he bought a gallon of milk, which could logically follow each other. However, the sentences could also be the man went to the store and penguins are flightless, which would be rather unlikely to appear consecutively.

The combination of these tasks make BERT understand both the relations between words and those between sentences. Pre-training only needs to be done once (saving computational power), and the pre-trained models are widely available online, for both one or multiple languages, and for cased and uncased text.

Fine-tuning

However, the pre-trained BERT model is still very general. To be able to use it for sentiment analysis, named entity recognition, summarization, translation, or something else, we therefore need to fine-tune the model for our specific use-case. The big advantage is that this fine-tuning is relatively inexpensive: most of the heavy lifting is already done in the pre-training stage, and only needs to be done once.

Photo by Sergey Svechnikov on Unsplash
Photo by Sergey Svechnikov on Unsplash

If you don’t have a labeled training set, already fine-tuned models are also widely available online, for example on the Hugging Face model hub. This is the approach I will use in this article.

For more information about BERT, I would recommend github.com/google-research/bert, or for more advanced readers the original research paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"


Hugging Face Transformers

One of the easiest ways to use BERT models is with Hugging Face Transformers: a state-of-the-art NLP library built on PyTorch and TensorFlow. Through their model hub, Hugging Face currently provides over 7500 pre-trained models on a variety of NLP tasks and in a variety of languages. This way, you can almost always find a model that corresponds to your specific objectives.

A look at the Hugging Face model library, which contains over 7500 pre-trained models. (Image by author)
A look at the Hugging Face model library, which contains over 7500 pre-trained models. (Image by author)

Each of these models can be fine-tuned on your own dataset using the simple methods provided by the Hugging Face Transformers library. However, more easily, the models can also be used out of the box with minimal programming using one of the pipelines Hugging Face Transformers offers for these eleven tasks:

  • Feature extraction
  • Sentiment analysis
  • Named entity recognition
  • Question answering
  • Mask filling
  • Summarization
  • Translation
  • Language generation
  • Text to text generation
  • Zero-shot classification
  • Multi-turn conversations

For more info, have a look at github.com/huggingface/transformers


Using Pipelines (with just 3 lines of code!)

Make sure you first install the Hugging Face Transformers library, for example by running pip install transformers in your terminal. Then, you can start using the Hugging Face models with as little as 3 lines of code! For example, look at this code for sentiment analysis:

See, that was easy! All you have to do is import the library, initialize a pipeline, and you’re ready to start using the model! As said before, these functions use pre-trained models from the Hugging Face model hub. By default, the sentiment analysis pipeline uses the distilbert-base-uncased-finetuned-sst-2-english model, but you can use any of the models from the model hub.

Let’s look at two extensions: picking different models from the model hub, and solving different tasks.

Using on of the 7500+ models from the Model Hub

You can easily use a different model than the default one by settings the model parameter when creating a pipeline. For example, let’s say we’re working on a project and want to predict financial sentiment. A quick search on the model hub gives us the ProsusAI/finbert model, which is specifically trained on sentiment of financial content. Implementation of this model is as simple as the previous example, by simply including the model parameter:

Hugging Face Transformers will automatically download the selected model for you!

Other NLP Tasks

Pipelines are currently able to handle 11 different tasks, from named entity recognition to translation. You can select a model by changing 'sentiment-analysis' to something else when creating the pipeline. For example, let’s try to translate ‘I love dogs!’ from English to German. Go to the model hub, filter on task ‘Translation’ and language "de", and you will see over 100 models presented. I will use the t5-small model:

That’s it! For a complete list of all tasks that the pipelines can do, have a look at this wiki page.


Conclusion

In this article you have read how the BERT model works and how it is trained. Furthermore, you have seen how powerful and easy it is to use with the Hugging Face Transformers pipelines. Truly, NLP is accessible for everyone!


This article was written as part of the Turing Machine & Deep Learning course at Erasmus University Rotterdam, the Netherlands. For more of my content, have a look at my Medium page or GitHub profile: RvMerle.


Related Articles