Serverless Model Hosting with Docker, AWS Lambda and API Gateway

Make the most of Docker support in Lambda to host your models without the need for dedicated servers.

Jonathan Readshaw
Towards Data Science

--

Image by Markus Spiske on Unsplash

Previously AWS Lambda deployment packages were limited to a maximum unzipped size of 250MB including requirements. This proved to be an obstacle when attempting to host Machine Learning models using the service, as common ML libraries and complex models led to deployment packages far larger than the 250MB limit.

However in December 2020 AWS announced support for packaging and deployment of Lambda functions as Docker Images. Critically in the context of Machine Learning, these images can be up to 10GB in size. This means that large dependencies (e.g. Tensorflow) and medium-sized models can be included in the image and hence model predictions can be served using Lambda.

In this article we’ll work through an example build and deployment for model hosting on Lambda. All of the relevant code used can be found here.

Architecture Overview

The solution can be broken down into three components:

  1. Docker Image: The docker image contains our dependencies, trained model pipeline and function code. AWS provides189097 a base image for various runtimes that can be built upon to ensure compatibility with the service.
  2. Lambda Function: Serverless resource that runs the function code in the Docker image based on incoming events/requests.
  3. API Gateway Endpoint: Used as a trigger for the Lambda function and the entry point for client requests. When prediction requests are received at the endpoint, the Lambda function is triggered with the request body included in the event sent to the function. The value(s) returned by the function are then returned to the client as a response.

Model

In this example our model will be a simple KNN implementation trained on the Iris classification dataset. Training won’t be covered in this post however the outcome is a scikit-learn Pipeline object consisting of the following objects:

1. StandardScaler: Standardises inputs based on the mean and standard deviation of the training samples.

2. KNeighborsClassifier: The actual pretrained model. Trained with K = 5.

The pipeline is saved using scikit-learn’s Joblib implementation to ‘model_pipeline.joblib’.

Function Code

Let’s start by considering the function code that will be used by Lambda to handle prediction request events (predict\app.py).

Lambda_handler has the required arguments for functions used by lambda. The pipeline object is loaded outside of the handler in order to avoid loading this on every invocation. Lambda will keep containers alive for a period until there are no events, so loading the model once on creation means it can be reused whilst the container is kept alive by Lambda.

The handler function itself is pretty simple; the required inputs are extracted from the event body and used to generate a prediction. The prediction is the returned as part of the JSON response. The response from the function will be returned to the client by API Gateway. A few checks are made to ensure the inputs are as expected, and any prediction errors are caught and logged.

Docker Image

The Dockerfile is structured as follows:

1. Pull AWS’ base Python 3.6 image

2. Copy required files from local directory to the root of the image

3. Install requirements

4. Run handler function

Deployment

To manage deployment and AWS resources we will use AWS Serverless Application Manager (SAM) CLI. Instructions to install SAM and its dependencies can be found here.

To use SAM for build and deployment a template file must be configured. This is used to specify the required AWS resources and associated configuration.

For this project the SAM template contains the following:

  • Miscellaneous info such as stack name, and global configuration for Lambda timeout.
  • MLPredictionFunction Resource: This is the Lambda function we want to deploy. This section contains the bulk of the required config:
  • Properties: here we specific that the function will be defined using a Docker Image (PackageType: Image) and that the function will be triggered via API Gateway (Type: API). The API path route name and type are also defined here.
  • Metadata contains the tag that will be used for built images and the location/name of the Dockerfile used for building the image.
  • Outputs lists all of the required resources that will be created by SAM. In this case it is the API Gateway endpoint, Lambda function and associated IAM role that SAM needs.

Running the command below builds the application image locally using the defined SAM template:

!sam build

If successful the function can be invoked locally with a sample event using the following command (see repo for example event):

!sam local invoke -e events/event.json

Once the function has been tested locally the image needs to be pushed to AWS ECR. First create a new repository:

!aws ecr create-repository --repository-name ml-deploy-sam

You will need to log in to ECR’s managed Docker service before the image can be pushed:

!aws ecr get-login-password --region <region> | docker login --username AWS \ --password-stdin <account id>.dkr.ecr.<region>.amazonaws.com

Now you can deploy your application using:

!sam deploy -g

This will run the deployment in “guided” mode, where you will need to confirm the name of the application, AWS region and the image repository created earlier. Accepting the default options for the remaining settings should be fine in most instances.

The deployment process will then begin and AWS resources will be provisioned. Once complete, each resource will be displayed in the console.

Considerations

  • Image updates: to deploy an updated model or function code, you can simply rebuild the image locally and re-run the deploy command. SAM will detect which aspects of the application have changed and update the relevant resources accordingly.
  • Cold Start: Each time Lambda spins up a container using our function code the model will be loaded before processing can start. This leads to a cold start scenario where the first request will be significantly slower than those that follow. One method to combat this is to periodically trigger the function using CloudWatch such that a container is always ready with the model loaded.
  • Multiple Functions: It is possible to deploy multiple Lambda functions to be served by a single API. This could be useful if you have multiple models to serve, or if you want to have a separate pre-processing/validation endpoint. To configure this you can simply include the additional functions in the SAM template resources.

--

--