Multilingual Serverless XLM RoBERTa with HuggingFace, AWS Lambda
Learn how to build a Multilingual Serverless BERT Question Answering API with a model size of more than 2GB and then testing it in German and France
Introduction
Currently, we have 7.5 billion people living on the world in around 200 nations. Only 1.2 billion people of them are native English speakers. This leads to a lot of unstructured non-English textual data.
Most of the tutorials and blog posts demonstrate how to build text classification, sentiment analysis, question-answering, or text generation models with BERT based architectures in English. In order to overcome this missing, we are going to build a multilingual Serverless Question Answering API.
Multilingual models describe machine learning models that can understand different languages. An example of a multilingual model is mBERT from Google research. This model supports and understands 104 languages.
We are going to use the new AWS Lambda Container Support to build a Question-Answering API with a xlm-roberta
. Therefore we use the Transformers library by HuggingFace, the Serverless Framework, AWS Lambda, and Amazon ECR.
The special characteristic about this architecture is that we provide a “State-of-the-Art” model with more than 2GB and that is served in a Serverless Environment
Before we start I wanted to encourage you to read my blog philschmid.de where I have already wrote several blog posts about Serverless, how to deploy BERT in a Serverless Environment, or How to fine-tune BERT models.
You can find the complete code for it in this Github repository.
Services included in this tutorial
Transformers Library by Huggingface
The Transformers library provides state-of-the-art machine learning architectures like BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5 for Natural Language Understanding (NLU), and Natural Language Generation (NLG). It also provides thousands of pre-trained models in 100+ different languages.
AWS Lambda
AWS Lambda is a serverless computing service that lets you run code without managing servers. It executes your code only when required and scales automatically, from a few requests per day to thousands per second.
Amazon Elastic Container Registry
Amazon Elastic Container Registry (ECR) is a fully managed container registry. It allows us to store, manage, share docker container images. You can share docker containers privately within your organization or publicly worldwide for anyone.
Serverless Framework
The Serverless Framework helps us develop and deploy AWS Lambda functions. It’s a CLI that offers structure, automation, and best practices right out of the box.
Tutorial
Before we get started, make sure you have the Serverless Framework configured and set up. You also need a working docker
environment. We use docker
to create our own custom image including all needed Python
dependencies and our multilingual xlm-roberta
model, which we then use in our AWS Lambda function. Furthermore, you need access to an AWS Account to create an IAM User, an ECR Registry, an API Gateway, and the AWS Lambda function.
We design the API in the following way:
We send a context (small paragraph) and a question to it and respond with the answer to the question. As model, we are going to use the xlm-roberta-large-squad2
trained by deepset.ai from the transformers model-hub. The model size is more than 2GB. It's huge.
What are we going to do:
- create a
Python
Lambda function with the Serverless Framework. - add the multilingual
xlm-roberta
model to our function and create an inference pipeline. - Create a custom
docker
image and test it. - Deploy a custom
docker
image to ECR. - Deploy AWS Lambda function with a custom
docker
image. - Test our Multilingual Serverless API.
You can find the complete code in this Github repository.
Create a Python
Lambda function with the Serverless Framework
First, we create our AWS Lambda function by using the Serverless CLI with the aws-python3
template.
This CLI command will create a new directory containing a handler.py
, .gitignore
, and serverless.yaml
file. The handler.py
contains some basic boilerplate code.
Add the multilingual xlm-roberta
model to our function and create an inference pipeline
To add our xlm-roberta
model to our function we have to load it from the model hub of HuggingFace. For this, I have created a python script. Before we can execute this script we have to install the transformers
library to our local environment and create a model
directory in our serverless-multilingual/
directory.
After we installed transformers
we create get_model.py
file and include the script below.
To execute the script we run python3 get_model.py
in the serverless-multilingual/
directory.
Tip: add the model
directory to .gitignore
.
The next step is to adjust our handler.py
and include our serverless_pipeline()
, which initializes our model and tokenizer. It then returns a predict
function, which we can use in our handler
.
Create a custom docker
image and test it.
Before we can create our docker
we need to create a requirements.txt
file with all the dependencies we want to install in our docker.
We are going to use a lighter Pytorch Version and the transformers library.
To containerize our Lambda Function, we create a dockerfile
in the same directory and copy the following content.
Additionally, we can add a .dockerignore
file to exclude files from your container image.
To build our custom docker
image we run.
We can start our docker
by running.
Afterwards, in a separate terminal, we can then locally invoke the function using curl
or a REST-Client.
Beware we have to stringify
our body since we are passing it directly into the function (only for testing).
Deploy a custom docker
image to ECR
Since we now have a local docker
image we can deploy this to ECR. Therefore we need to create an ECR repository with the name multilingual-lambda
.
To be able to push our images we need to login to ECR. We are using the aws
CLI v2.x. Therefore we need to define some environment variables to make deploying easier.
Next, we need to tag
/ rename our previously created image to an ECR format. The format for this is{AccountID}.dkr.ecr.{region}.amazonaws.com/{repository-name}
To check if it worked we can run docker images
and should see an image with our tag as the name.
Finally, we push the image to ECR Registry.
Deploy AWS Lambda function with a custom docker
image
I provide the complete serverless.yaml
for this example, but we go through all the details we need for our docker
image and leave out all standard configurations. If you want to learn more about the serverless.yaml
, I suggest you check out Scaling Machine Learning from ZERO to HERO. In this article, I went through each configuration and explained the usage of them.
Attention: We need at least 9GB of memory and 300s as a timeout.
To use a docker
image in our serverlss.yaml
we have to add the image
in our function
section. The image
has the URL to our docker
image also value.
For an ECR image, the URL should look like tihs <account>.dkr.ecr.<region>.amazonaws.com/<repository>@<digest>
(e.g 000000000000.dkr.ecr.sa-east-1.amazonaws.com/test-lambda-docker@sha256:6bb600b4d6e1d7cf521097177dd0c4e9ea373edb91984a505333be8ac9455d38
)
You can get the ecr url via the AWS Console.
In order to deploy the function, we run serverless deploy
.
After this process is done we should see something like this.
Test our Multilingual Serverless API
To test our Lambda function we can use Insomnia, Postman, or any other REST client. Just add a JSON with a context
and a question
to the body of your request. Let´s try it with a German
example and then with a French
example.
Beaware that if you invoke your function the first time there will be an cold start. Due to the model size. The cold start is bigger than 30s so the first request will run into an API Gateway timeout.
German:
Our serverless_pipeline()
answered our question correctly with 40,4 Prozent
.
French:
Our serverless_pipeline()
answered our question correctly with 40,4%
.
Conclusion
The release of the AWS Lambda Container Support and the increase from Memory up to 10GB enables much wider use of AWS Lambda and Serverless. It fixes many existing problems and gives us greater scope for the deployment of serverless applications.
I mean we deployed a docker container containing a “State-of-the-Art” multilingual NLP model bigger than 2GB in a Serverless Environment without the need to manage any server.
It will automatically scale up to thousands of parallel requests without any worries.
The future looks more than golden for AWS Lambda and Serverless.
You can find the GitHub repository with the complete code here.
Thanks for reading. If you have any questions, feel free to contact me or comment on this article. You can also connect with me on Twitter or LinkedIn.
Originally published at https://www.philschmid.de on December 17, 2020.