Goal of this post is a to set up a serverless infrastructure, managed in code, to serve predictions of a containerized machine learning model via Rest API as simple as:
$ curl
$ -X POST
$ --header "Content-Type: application/json"
$ --data '{"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}'
$ https://my-api.execute-api.eu-central-1.amazonaws.com/predict/
{"prediction": {"label": "virginica", "probability": 0.9997}}
We will make use of Terraform to manage our infrastructure, including AWS ECR, S3, Lambda and API Gateway. We will make use of AWS Lambda to run the model code, in fact within a container, which is a very recent feature. We will use AWS API Gateway to serve the model via Rest API. The model artifact itself will live within S3. You can find the full code here.

Prerequisites
We use Terraform v0.14.0
and aws-cli/1.18.206 Python/3.7.9 Darwin/19.6.0 botocore/1.19.46
.
We need authenticate to AWS to:
- set up the infrastructure using Terraform.
- train the model and store the resulting model artifact in S3
- test the infrastructure using the AWS CLI (here: )
The AWS credentials can be set up within a credentials file, i.e.~/.aws/credentials
, using a profile named lambda-model
:
[lambda-model]
aws_access_key_id=...
aws_secret_access_key=...
region=eu-central-1
This will allow us to tell Terraform but also the AWS CLI which credentials to use. The Lambda function itself will authenticate using a role, and therefore no explicit credentials are needed.
Moreover we need to define the region, bucket name and some other variables, these are also defined within the Terraform variables as we will see later:
export AWS_REGION=$(aws --profile lambda-model configure get region)
export BUCKET_NAME="my-lambda-model-bucket"
export LAMBDA_FUNCTION_NAME="my-lambda-model-function"
export API_NAME="my-lambda-model-api"
export IMAGE_NAME="my-lambda-model"
export IMAGE_TAG="latest"
Creating a containerized model
Let us build a very simple containerized model on the iris
dataset. We will define:
model.py
: the actual model codeutils.py
: utility functionstrain.py
: a script to trigger model trainingtest.py
: a script to generate predictions (for testing purposes)app.py
: the Lambda handler
To store the model artifact and load data for model training we will define a few helper functions to communicate with S3 and load files with training data from public endpoints within a utils.py
:
Furthermore we need a wrapper class for our model:
- to train it using external data
- to keep state and save and load the model artifact
- to pass payloads for inference
This will be defined in model.py
:
To train and predict without the actual Lambda infrastructure we will also set up two scripts, a train.py
and a predict.py
. The training script can be very simple, we could also pass other data sources to the train
method.
from model import ModelWrapper
model_wrapper = ModelWrapper()
model_wrapper.train()
And a simple predict.py
that prints predictions to the console:
import json
import sys
from model import ModelWrapper
model_wrapper = ModelWrapper()
model_wrapper.load_model()
data = json.loads(sys.argv[1])
print(f"Data: {data}")
prediction = model_wrapper.predict(data=data)
print(f"Prediction: {prediction}")
Lastly, we need the handler to pass data to the model wrapper. This is what will be called by the Lambda function. We will keep it very minimalistic, the handler will simply pass the request to the wrapper and transform the returned predictions into the output format expected by API Gateway:
We will put all this into Docker (more specifically a Dockerfile
) and make use of one of the AWS Lambda base images:
FROM public.ecr.aws/lambda/python:3.8
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py utils.py model.py train.py predict.py ./
CMD ["app.handler"]
Creating ECR and S3 resources
Let us now define ECR repository and S3 bucket via Terraform. The properly organized Terraform code can be found within the GitHub repo.
We define some config (variables
and locals
) and AWS as provider
. Alternatively the variables could also be loaded from the environment.
Moreover S3 and ECR repository:
Let us create our S3 bucket and ECR repository:
(cd terraform &&
terraform apply
-target=aws_ecr_repository.lambda_model_repository
-target=aws_s3_bucket.lambda_model_bucket)
Building and pushing the docker image
We can now build our docker image and push it to the repo (alternatively this could be done in a null_resource
provisioner
in Terraform). We export registry ID to construct the image URI where we want to push the image:
export REGISTRY_ID=$(aws ecr
--profile lambda-model
describe-repositories
--query 'repositories[?repositoryName == `'$IMAGE_NAME'`].registryId'
--output text)
export IMAGE_URI=${REGISTRY_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${IMAGE_NAME}
# ecr login
$(aws --profile lambda-model
ecr get-login
--region $AWS_REGION
--registry-ids $REGISTRY_ID
--no-include-email)
Now building and pushing is as easy as:
(cd app &&
docker build -t $IMAGE_URI . &&
docker push $IMAGE_URI:$IMAGE_TAG)
Training the model
Let us train our model now using the train.py
entry point of our newly created docker container:
docker run
-v ~/.aws:/root/.aws
-e AWS_PROFILE=lambda-model
-e BUCKET_NAME=$BUCKET_NAME
--entrypoint=python
$IMAGE_URI:$IMAGE_TAG
train.py
# Loading data.
# Creating model.
# Fitting model with 150 datapoints.
# Saving model.
Testing the model
Using the predict.py
entry point we can also test it with some data:
docker run
-v ~/.aws:/root/.aws
-e AWS_PROFILE=lambda-model
-e BUCKET_NAME=$BUCKET_NAME
--entrypoint=python
$IMAGE_URI:$IMAGE_TAG
predict.py
'{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
# Loading model.
# Data: {'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2}
# Prediction: ('setosa', 0.9999555689374946)
Planning our main infrastructure with Terraform
We can plan the inference part of the infrastructure now, the Lambda & API Gateway setup:
- the Lambda function, including a role and policy to access S3 and produce logs,
-
the API Gateway, including the necessary permissions and settings.
We can now apply this using the Terraform CLI again, which will take around a minute.
(cd terraform && terraform apply)
Testing the infrastructure
To test the Lambda function we can invoke it using the AWS CLI and store the response to response.json
:
aws --profile lambda-model
lambda
invoke
--function-name $LAMBDA_FUNCTION_NAME
--payload '{"body": {"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}}'
response.json
# {
# "StatusCode": 200,
# "ExecutedVersion": "$LATEST"
# }
The response.json
will look like this:
{
"statusCode": 200,
"body": "{"prediction": {"label": "virginica", "probability": 0.9997}}",
"isBase64Encoded": false
}
And we can also test our API using curl
or python. We need to find out our endpoint URL first, for example again by using the AWS CLI or alternatively the Terraform output prints.
export ENDPOINT_ID=$(aws
--profile lambda-model
apigateway
get-rest-apis
--query 'items[?name == `'$API_NAME'`].id'
--output text)
export ENDPOINT_URL=https://${ENDPOINT_ID}.execute-api.${AWS_REGION}.amazonaws.com/predict
curl
-X POST
--header "Content-Type: application/json"
--data '{"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}'
$ENDPOINT_URL
# {"prediction": {"label": "virginica", "probability": 0.9997}}
Alternatively we can send POST requests with python:
import requests
import os
endpoint_url = os.environ['ENDPOINT_URL']
data = {"sepal_length": 5.9, "sepal_width": 3, "petal_length": 5.1, "petal_width": 1.8}
req = requests.post(endpoint_url, json=data)
req.json()
More remarks
To update the container image we can use the CLI again:
aws --profile lambda-model
lambda
update-function-code
--function-name $LAMBDA_FUNCTION_NAME
--image-uri $IMAGE_URI:$IMAGE_TAG
If we want to remove our infrastructure we have to empty our bucket first, after which we can destroy our resources:
aws s3 --profile lambda-model rm s3://${BUCKET_NAME}/model.pkl
(cd terraform && terraform destroy)
Conclusion
With the new feature of containerized Lambdas it became even more easy to deploy Machine Learning models into the AWS serverless landscape. There are many AWS alternatives to this (ECS, Fargate, Sagemaker), but Lambda comes with many tools out of the box, for example request based logging and monitoring, and it allows fast prototyping with ease. Nevertheless it also has some downsides, for example the request latency overhead and the usage of somewhat proprietary cloud service which is not fully customizable.
Another advantage is that containerization allows us to isolate the machine learning code and properly maintain package dependencies. If we keep the handler code to a minimum we can test the image carefully and make sure our development environment is very close to the production infrastructure. At the same time we are not locked into AWS technology – we can very easily replace the handler with our own web framework and deploy it to Kubernetes.
Lastly, we can potentially improve our model infrastructure by running the training remotely (for example using ECS), adding versioning and CloudWatch alerts. If needed, we could add a process to keep the Lambda warm, since a cold start takes a few seconds. We should also add authentication to the endpoint.
Originally published at https://blog.telsemeyer.com.