The world’s leading publication for data science, AI, and ML professionals.

Building a serverless, containerized machine learning model API using AWS Lambda & API Gateway and…

Dockerized Lambdas

Goal of this post is a to set up a serverless infrastructure, managed in code, to serve predictions of a containerized machine learning model via Rest API as simple as:

$ curl 
$  -X POST 
$  --header "Content-Type: application/json" 
$  --data '{"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}' 
$  https://my-api.execute-api.eu-central-1.amazonaws.com/predict/
{"prediction": {"label": "virginica", "probability": 0.9997}}

We will make use of Terraform to manage our infrastructure, including AWS ECR, S3, Lambda and API Gateway. We will make use of AWS Lambda to run the model code, in fact within a container, which is a very recent feature. We will use AWS API Gateway to serve the model via Rest API. The model artifact itself will live within S3. You can find the full code here.

Architecture diagram - image done by author using draw.io
Architecture diagram – image done by author using draw.io

Prerequisites

We use Terraform v0.14.0 and aws-cli/1.18.206 Python/3.7.9 Darwin/19.6.0 botocore/1.19.46.

We need authenticate to AWS to:

  • set up the infrastructure using Terraform.
  • train the model and store the resulting model artifact in S3
  • test the infrastructure using the AWS CLI (here: )

The AWS credentials can be set up within a credentials file, i.e.~/.aws/credentials, using a profile named lambda-model:

[lambda-model]
aws_access_key_id=...
aws_secret_access_key=...
region=eu-central-1

This will allow us to tell Terraform but also the AWS CLI which credentials to use. The Lambda function itself will authenticate using a role, and therefore no explicit credentials are needed.

Moreover we need to define the region, bucket name and some other variables, these are also defined within the Terraform variables as we will see later:

export AWS_REGION=$(aws --profile lambda-model configure get region)
export BUCKET_NAME="my-lambda-model-bucket"
export LAMBDA_FUNCTION_NAME="my-lambda-model-function"
export API_NAME="my-lambda-model-api"
export IMAGE_NAME="my-lambda-model"
export IMAGE_TAG="latest"

Creating a containerized model

Let us build a very simple containerized model on the iris dataset. We will define:

  • model.py: the actual model code
  • utils.py: utility functions
  • train.py: a script to trigger model training
  • test.py: a script to generate predictions (for testing purposes)
  • app.py: the Lambda handler

To store the model artifact and load data for model training we will define a few helper functions to communicate with S3 and load files with training data from public endpoints within a utils.py:

Furthermore we need a wrapper class for our model:

  • to train it using external data
  • to keep state and save and load the model artifact
  • to pass payloads for inference

This will be defined in model.py:

To train and predict without the actual Lambda infrastructure we will also set up two scripts, a train.py and a predict.py. The training script can be very simple, we could also pass other data sources to the train method.

from model import ModelWrapper
model_wrapper = ModelWrapper() 
model_wrapper.train()

And a simple predict.py that prints predictions to the console:

import json
import sys
from model import ModelWrapper
model_wrapper = ModelWrapper()
model_wrapper.load_model()
data = json.loads(sys.argv[1])
print(f"Data: {data}")
prediction = model_wrapper.predict(data=data)
print(f"Prediction: {prediction}")

Lastly, we need the handler to pass data to the model wrapper. This is what will be called by the Lambda function. We will keep it very minimalistic, the handler will simply pass the request to the wrapper and transform the returned predictions into the output format expected by API Gateway:

We will put all this into Docker (more specifically a Dockerfile) and make use of one of the AWS Lambda base images:

FROM public.ecr.aws/lambda/python:3.8
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py utils.py model.py train.py predict.py ./
CMD ["app.handler"]

Creating ECR and S3 resources

Let us now define ECR repository and S3 bucket via Terraform. The properly organized Terraform code can be found within the GitHub repo.

We define some config (variables and locals) and AWS as provider. Alternatively the variables could also be loaded from the environment.

Moreover S3 and ECR repository:

Let us create our S3 bucket and ECR repository:

(cd terraform &&  
  terraform apply 
  -target=aws_ecr_repository.lambda_model_repository 
  -target=aws_s3_bucket.lambda_model_bucket)

Building and pushing the docker image

We can now build our docker image and push it to the repo (alternatively this could be done in a null_resource provisioner in Terraform). We export registry ID to construct the image URI where we want to push the image:

export REGISTRY_ID=$(aws ecr 
  --profile lambda-model 
  describe-repositories 
  --query 'repositories[?repositoryName == `'$IMAGE_NAME'`].registryId' 
  --output text)
export IMAGE_URI=${REGISTRY_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${IMAGE_NAME}
# ecr login
$(aws --profile lambda-model 
  ecr get-login 
  --region $AWS_REGION 
  --registry-ids $REGISTRY_ID 
  --no-include-email)

Now building and pushing is as easy as:

(cd app && 
  docker build -t $IMAGE_URI . && 
  docker push $IMAGE_URI:$IMAGE_TAG)

Training the model

Let us train our model now using the train.py entry point of our newly created docker container:

docker run 
  -v ~/.aws:/root/.aws 
  -e AWS_PROFILE=lambda-model 
  -e BUCKET_NAME=$BUCKET_NAME 
  --entrypoint=python 
  $IMAGE_URI:$IMAGE_TAG 
  train.py
# Loading data.
# Creating model.
# Fitting model with 150 datapoints.
# Saving model.

Testing the model

Using the predict.py entry point we can also test it with some data:

docker run 
  -v ~/.aws:/root/.aws 
  -e AWS_PROFILE=lambda-model 
  -e BUCKET_NAME=$BUCKET_NAME 
  --entrypoint=python 
  $IMAGE_URI:$IMAGE_TAG 
  predict.py 
  '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
# Loading model.
# Data: {'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2}
# Prediction: ('setosa', 0.9999555689374946)

Planning our main infrastructure with Terraform

We can plan the inference part of the infrastructure now, the Lambda & API Gateway setup:

  • the Lambda function, including a role and policy to access S3 and produce logs,
  • the API Gateway, including the necessary permissions and settings.

    We can now apply this using the Terraform CLI again, which will take around a minute.

(cd terraform && terraform apply)

Testing the infrastructure

To test the Lambda function we can invoke it using the AWS CLI and store the response to response.json:

aws --profile lambda-model 
  lambda 
  invoke 
  --function-name $LAMBDA_FUNCTION_NAME 
  --payload '{"body": {"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}}' 
  response.json
# {
#     "StatusCode": 200,
#     "ExecutedVersion": "$LATEST"
# }

The response.json will look like this:

{
    "statusCode": 200,
    "body": "{"prediction": {"label": "virginica", "probability": 0.9997}}",
    "isBase64Encoded": false
}

And we can also test our API using curl or python. We need to find out our endpoint URL first, for example again by using the AWS CLI or alternatively the Terraform output prints.

export ENDPOINT_ID=$(aws 
  --profile lambda-model 
  apigateway 
  get-rest-apis 
  --query 'items[?name == `'$API_NAME'`].id' 
  --output text)
export ENDPOINT_URL=https://${ENDPOINT_ID}.execute-api.${AWS_REGION}.amazonaws.com/predict
curl 
  -X POST 
  --header "Content-Type: application/json" 
  --data '{"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}' 
  $ENDPOINT_URL
# {"prediction": {"label": "virginica", "probability": 0.9997}}

Alternatively we can send POST requests with python:

import requests
import os
endpoint_url = os.environ['ENDPOINT_URL']
data = {"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}
req = requests.post(endpoint_url, json=data)
req.json()

More remarks

To update the container image we can use the CLI again:

aws --profile lambda-model 
  lambda 
  update-function-code 
  --function-name $LAMBDA_FUNCTION_NAME 
  --image-uri $IMAGE_URI:$IMAGE_TAG

If we want to remove our infrastructure we have to empty our bucket first, after which we can destroy our resources:

aws s3 --profile lambda-model rm s3://${BUCKET_NAME}/model.pkl
(cd terraform && terraform destroy)

Conclusion

With the new feature of containerized Lambdas it became even more easy to deploy Machine Learning models into the AWS serverless landscape. There are many AWS alternatives to this (ECS, Fargate, Sagemaker), but Lambda comes with many tools out of the box, for example request based logging and monitoring, and it allows fast prototyping with ease. Nevertheless it also has some downsides, for example the request latency overhead and the usage of somewhat proprietary cloud service which is not fully customizable.

Another advantage is that containerization allows us to isolate the machine learning code and properly maintain package dependencies. If we keep the handler code to a minimum we can test the image carefully and make sure our development environment is very close to the production infrastructure. At the same time we are not locked into AWS technology – we can very easily replace the handler with our own web framework and deploy it to Kubernetes.

Lastly, we can potentially improve our model infrastructure by running the training remotely (for example using ECS), adding versioning and CloudWatch alerts. If needed, we could add a process to keep the Lambda warm, since a cold start takes a few seconds. We should also add authentication to the endpoint.


Originally published at https://blog.telsemeyer.com.


Related Articles