Unlock the Latest Transformer Models with Amazon SageMaker

A quick tutorial on extending and customising AWS’ Deep Learning Containers

Heiko Hotz

Published in

Towards Data Science

5 min readDec 6, 2022

What is this about?

AWS Deep Learning Containers (DLCs) have become a popular choice for training and deploying Natural Language Processing (NLP) models on Amazon SageMaker (SM), thanks to their convenience and ease of use. However, sometimes the latest versions of the Transformers library are not available in the prebuilt DLCs. In this blog post, we will extend these DLCs to train & deploy the latest Hugging Face models on AWS. Whether you are new to DLCs or an experienced user, this post will provide valuable insights and techniques for running Hugging Face models on AWS.

The code for this tutorial is available in this GitHub repo.

Why is this important?

Amazon SageMaker is a popular platform for running AI models, but running the latest Transformer models (e.g. Whisper, BLOOM) requires an up-to-date version of the transformers library. The latest available DLC on AWS, however, only support version 4.17, while, at the time of writing, the latest version is 4.25.1. When trying to use the latest DLC for a model that is not supported, users will encounter an error message, see this thread on the Hugging Face (HF) discussion forum for example

A workaround for this problem involves injecting a requirements.txt file into the model, but this can be a slow and tedious process. It requires downloading the model, adding the requirements.txt file, tarballing the model, and uploading it to S3, which can take several hours for large models. In this blog post we will exploit the fact that AWS’ pre-built DLCs can be extended, thus saving users from this workaround and therefore many hours anytime they want to train or deploy a new model.

How to deploy Transformer models on SageMaker

Deploying Transformer models on SM is usually a breeze, especially if you want to use the pre-built models without further training. In this case we can deploy the model directly from the HF Model Hub to SM:

Here is what happens in the background when running this code:

Once the deploy() command is executed, SM spins up an EC2 instance and fetches the image from the specified DLC, which is determined by the version numbers for the transformers and the pytorch libraries. We can find a list of available DLC here — those is fetched from a specific AWS account and are publicly available, see this example.

As we can see in this list the latest available transformers version in the DLCs is 4.17, but many models will require a version higher than this.

The problem with the latest Transformer models

We can see this when trying to run a model that requires a version higher than 4.17 using the latest DLC. The deployment will be successful, but when we try to use the model we get the below error message:

And in the Cloudwatch logs we see more info:

This means that the model we were trying to deploy (in this case BLOOM) via the latest DLC is not possible.

The workaround — injecting a requirements.txt file

As mentioned there exists a workaround — the HF SM Inference Toolkit allows for custom inference code as well as the possibility to specify required additional libraries via a requirements.txt file. We can use this mechanism by just adding the latest transformers version to the requirements.txt file like so:

To inject this requirements.txt file into the model, however, some additional steps required:

In this case we first need to manually download the model from the HF Model Hub. Then we add the requirements.txt file to the model directory and tarball it. Then the model needs to be uploaded to an S3 bucket. Then, finally, we can deploy the model, pointing the endpoint to the S3 location of the model. When spinning up the EC2 instance for the endpoint, the requirements.txt file will be read and the latest transformers version will be installed.

If this is a relatively small model, this is cumbersome but doesn’t take all too long. If this is the full BLOOM model, however, this whole procedure can take up to 12 hours (believe me, I tried 😕)

The solution — extending the pre-built DLCs

Instead we want to keep deploying directly from the HF Model Hub, especially if no custom code is required or any model finetuning is required. In this case we can just write a Docker file that first pulls the latest DLC from the public AWS ECR and then add our own requirements, in this case just a “pip install” command to update to the latest transformers version:

We can then run the official AWS tutorial to extend DLCs, we just need to make sure we adapt the naming and that the role we run this script with has the priviliges to read from and write to the ECR service:

Once this script has finished after a few minutes we should see the new DLC in our ECR.

Testing the new DLC

Now that the new DLC is ready, we can test the deployment from before again, but this time with our extended DLC:

Now when we run an inference request we get a proper response:

And this is now what happens now in the background:

We had the one-time effort to extend the DLC, but now, whenever we want to deploy models from the HF Model Hub that require an up-to-date version of the transformers library, we can just reuse our DLC instead of the official one from AWS 🤗

Conclusion

In this tutorial we extended the official HF DLCs from AWS to update the transformers library to a later version that is required for many new Transformer models. By doing so we have created a reusable DLC of our own which enables us to directly deploy from the HF Model Hub, thus saving us hours of tedious workaround.

Note that this blog post has mostly been focusing on extending the inference DLC, however, the same idea can be used for the training DLCs as well.