AWS SageMaker Endpoint as REST service with API Gateway

Vahe Sahakyan
Towards Data Science
4 min readJul 3, 2019

--

There are many ways one can deploy AWS SageMaker endpoint as REST service, one of which represented in AWS Blog using Lambda function. In this article I’m going to show how it can be done without involving active components like Lambda. The architectural overview of what we are going to setup will have the following simple structure:

Model Preparation

First, let’s train a demo model for SageMaker — MNIST in particular. This code snippet will do training and create model.tar.gz file in current directory.

Make sure to run pip install tensorflow before executing the training script.

The script will save model files in ./model/{timestamp} directory and create model.tar.gz in current directory. Generally, when creating models for SageMaker you need to also provide docker image that contains your inference code, web-server, relevant libraries, etc. But since we are using tensorflow model we can utilize AWS managed sagemaker-tensorflow-serving container as inference image. All heavy-lifting and boilerplate code is already packaged in this image. We only need to provide the model file and optionally custom code for inference pre/post-processing.

The serving container accepts only application/json content type as input. We can add custom inference code to handle additional content types, like image payloads as well, but it is out of scope of this article.

Deployment

After making the model file we can proceed with AWS deployment. Bellow is more detailed architecture of AWS resources used in this setup

Detailed diagram of AWS resources

Please note, that all examples are built using terraform 0.12 (with HCL 2 syntax).

First we will create S3 bucket and upload model package into the bucket.

Now we can create SageMaker model and deploy the endpoint.

Please note, that the IAM role created here is for demo purposes only, don’t use AmazonSageMakerFullAccess policy in production.

The endpoint deployment consists from 2 main parts (besides IAM role).

First the model is created. It encapsulates the role, the model file in S3 bucket, the inference docker image and some environment variables. As was mentioned before, we are using AWS managed inference image, which is owned by 520713654638 AWS account.

Second, the endpoint configuration and endpoint itself is created.

The deployment usually takes 5–6 mins, so be patient :)

Now let’s test the endpoint. We need some test data in JSON format. It can be extracted from MNIST dataset with this code snippet

It will save single image into payload.json file and print into stdout the expected result (7 in this case). You can change the idx variable, to extract other images. Please note, that whole result is encapsulated in additional array, because API can do inference on multiple images at once. Now we can invoke sagemaker-runtime invoke-endpoint to do prediction.

$ aws sagemaker-runtime invoke-endpoint --endpoint-name mnist-test --body file://payload.json --content-type application/json result.json
{
"ContentType": "application/json",
"InvokedProductionVariant": "main"
}
$ cat res.json
{
"predictions": [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]
]
}

So far so good, but we still only deployed SageMaker endpoint. Now comes the interesting part — integration with API Gateway.

Let’s create the regular stuff first, i.e. API Gateway REST resource, method and method response.

And finally the let’s create the integration with SageMaker. In order for integration to work, we need another IAM role allowing API Gateway InvokeEndpoint access and the integration resource itself.

The most important line in this code is the URL in integration resource, which neither documented nor examples are available in AWS docs.

arn:aws:apigateway:${var.region}:runtime.sagemaker:path//endpoints/${aws_sagemaker_endpoint.endpoint.name}/invocations

Please note, that double slashes (//) are intentional, without them integration will not work. After successful deployment terraform will output invocation endpoint URL. We can use our payload.json to make prediction using curl :

$ curl -XPOST https://{api-gw-id}.execute-api.eu-central-1.amazonaws.com/predict -H "Content-Type: application/json" -d @payload.json
{
"predictions": [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]
]
}

Limits

Check API Gateway and SageMaker Endpoint limitations for more details. Notable ones are

  • Max payload is limited to 10 MB by API Gateway
  • Max 60 sec execution time by SageMaker

Conclusion

This is very quick, easy, stable and cost efficient way of making flexible RESTish inference endpoints. It can be extended to also support

  • Cognito or static key based authorization
  • VPC private deployments
  • and all the other features of API Gateways
  • AutoScaling with SageMaker
  • Getting GPU power by using Elastic Inference

Thanks for reading.

--

--

I am a tech enthusiast interested in many aspects of IT industry. Currently my main focus is cloud. https://github.com/spirius