Deploying a Custom Docker Model with SageMaker to a Serverless Front-end with S3

How to effectively deploy a model with AWS

Max Brenner

Published in

Towards Data Science

14 min readAug 27, 2020

Introduction

Deploying a model with AWS SageMaker is a great way to allow users or customers to interact with it. While you can use the many algorithms and models that come with SageMaker’s Python SDK, there are also many cases where you might want to create your own algorithm. This requires the use of Docker.

The easiest way to let people interact with your model is by employing AWS Lambda and API Gateway to setup an API POST request to the model.

Finally, having an API for sending requests and receiving inference responses from the model is great, but a simple and nice looking web app for users to interact with the data is even better. This is where hosting a static website with S3 and accessing the endpoint with AJAX comes into play.

All of these components of AWS working in unison allows for a sleek development and inference serving framework. This tutorial also covers using AWS CloudWatch to understand ambiguous errors such as 500 Internal Server Errors from the custom model.

Check out the repo for the some of the code mentioned below, here.

Docker Model

To deploy a custom model with SageMaker it must be wrapped by SageMaker’s Estimator class. This can be done by creating a Docker image that interfaces with this class.

Also, check out this post, if you don’t know what Docker is and why it's so important nowadays.

Install Docker

Install Docker for your respective OS with these links: Mac, Ubuntu and Windows. Windows Home edition is a bit more difficult so I will cover some of the steps here, since Windows is the OS I use. Follow these steps first and see how far you get. Refer to the steps below if you get stuck.

WSL 2 is required in Windows Home edition to run Docker Desktop.

Click “Check for updates” and install until there are no more, or find the 2004 update itself

First, you must update to Windows 2004, do this by manually checking for updates and installing any that appear (warning: this will take a lot of memory and time). Then you can install WSL 2 using these directions.

Docker Image

This tutorial does a great job of explaining each step to setting up a Docker image for our custom model. So follow that tutorial, but first keep these issues in mind:

A problem I ran into with the template provided is with installing gevent. As you can see in my Dockerfile, I use easy_install instead of pip.

RUN easy_install gevent

Important: If you create the image code on Windows then you have to make the files Unix compatible, as the endline symbols are different. This can be done with the command dos2unix:

find some/directory -type f -exec dos2unix {} \;

Where /some/directory is the directory of the image code.

Also, make sure to allow files such as model/serve and model/train to be executed with:

chmod +x file/path

Now follow the steps of the other tutorial until you get to the AWS ECR section. Make sure to create the image and locally test your model as described.

Inference-only Model

Sometimes you may not need to actually train your model before inference. This was my case with my application I will show at the end of this article. Unfortunately, SageMaker still requires that the model must be fit before deploying. However, you can setup a dummy training script in the Docker image very easily. For example, you can just open the model path and write a dummy string as training.

Here is (some of) my training script: docker-image-code/anomaly-model/train

Register Docker Image

Once your model image has been created and it works locally, for SageMaker to be able to access it, it must be stored in AWS Elastic Container Registry (ECR).

Go to AWS ECR and click “Get Started”. Click “Create repository” in orange.

Enter a name that represents your model under “Repository name”. And then “Create repository” again. You don’t need to change any of the other options.

Then select the new repo and click “View push commands”

Follow the “macOS / Linux” commands that are shown on your page. If this is your first time doing something like this you will need AWS Command Line Interface (CLI). Follow these instructions. And authenticate Docker with this. Once CLI and authentication is setup you won’t have to do it again.

Whenever you want to update your model: update the image code, re-create the image, test it locally, and follow the push commands again.

Phew, that probably took a while and some googling, especially if you are on Windows but hopefully I got you through it successfully. If not, let me know in the comments where the problem resides. At this point, you should have a working Docker image for your model and have stored it in the registry.

Deploying with SageMaker

Now we get into using other parts of AWS. First, we have to deploy the model with SageMaker, then use AWS Lambda and API Gateway to set up an API for posting data to your model and receiving an inference response.

In a little more detail, the client calls the API created with API Gateway and passes in data for inference. API Gateway then passes this data to the Lambda function. Here it is parsed and sent to the SageMaker model endpoint (known as “invoking”). The model performs prediction with this data and the output is sent back through lambda and API Gateway which responds to the client with the predicted value.

SageMaker Notebook

First, we are going to deploy our model with SageMaker.

Create a new notebook instance (or use an existing one). You can use the default settings, and set a name. Upload this notebook from my repo. The only thing that needs to be set is the docker_image_name variable, to the name of your registered docker image. It can be found in ECR under “repositories” as the URI:

Let me explain the code for the notebook below:

First install SageMaker version 1.72 (I could not get this to work with the most recent version but it might work now). The session, client and execution role are grabbed to create a bucket for the model output. The docker image name is the repo URI for your uploaded image.

Create the estimator which takes many arguments. The one that can be changed based on your needs is train_instance_type. This is the AWS cloud compute instance that will be used to train the model when fit is called. Note: m1.m4.xlarge seems to be the least expensive one allowed for training. More instance types that may or may not work for training are listed here. If you need more memory for training or just want it to be faster try more expensive instances.

Next, we call fit to train the model. This runs the train script in the docker image model directory. You will need to call fit even if the model’s only purpose is inference. This was the case for my project.

Finally, we make the endpoint by deploying the model. endpoint_name should be something recognizable. The instance types for inference can be even less intensive than for training such as m1.t2.medium which is the cheapest.

Here is more about the price of SageMaker. Use AWS Cost Management to track training and deployment cost of your model (costs can quickly get out of hand if you are not careful about this).

Now we have the model deployed! Lets figure out how to actually allow people to interact with it.

Creating an API with Lambda & API Gateway

We have to create a lambda function to invoke the endpoint. Go to AWS Lambda and create a new function. Name it something helpful, change the runtime to Python 3.6 and then select an existing execution role that has permission to invoke a model endpoint or create a new execution role.

IAM Role Permissions

To give permission to invoke model endpoints to your role go to AWS IAM and then “Roles” on the sidebar:

Click on the role which for the above example would be “myLambdaFunction-role-….”. In the open “Permissions” tab, click the only policy.

A JSON description of the policy should come up. Click “Edit policy” and then the “JSON” tabg and add the line “sagemaker:InvokeEndpoint” like so:

Which allows the role to interact with SageMaker endpoints. Click “Review policy” and “Save changes” in blue at the bottom.

Back to lambda, in the “lambda_function” code panel replace existing code with this code.

First you will notice the ENDPOINT_NAME variable. This links our model’s endpoint to this function. Which in my case is anomaly-detection-endpoint:

SageMaker code for deploying model; shows the endpoint_name you chose

Edit the “Environment variables” below the function code:

Add an environment variable with the key as ENDPOINT_NAME and the value as the endpoint name from SageMaker. In my case it would be anomaly-detection-endpoint. And save it.

Looking back at the lambda function code the next important thing is ContentType=’application/json’ in invoking the endpoint. In my case the input data is a JSON because I included some hyperparameters for inference, not just the input data for the model. However, if you don’t need inference hyperparameters then you can make the type ‘text/csv’ and the steps are all pretty much the same.

Creating an API Gateway

The last step of accessing the deployed model with a POST request is setting up an API. Go to AWS API Gatway and click “Create API” in the top right after logging in. Then choose “HTTP API” as the type and click “Build”. Add a Lambda Integration and choose the correct AWS Region and the name of the lambda function you just made. Then name the gateway. Something like this:

Click “Next” and add a route. Select “POST” for the method. In “Resource path” input the lambda name again like /myLambdaFunction. And under “Integration target” put the lambda function name yet again.

For stages, add a stage called something like test and click the switch so that this stage is auto-deployed. This will be a part of the POST URI.

Finally, hit “Create” at the bottom.

If you go back to your lambda function you will see API Gateway in the diagram at the top. Click it and then show the details. “API endpoint” is the URL for the POST request that people can use to get predictions from your model. It follows the format:

https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/{resource_name}

Finally, you will probably want to allow all origins and headers in the CORS configuration for this API Gateway, otherwise you are going to run into CORS issues while trying to make a POST request in your static S3 website with AJAX.

Go back to API Gateway and click on your API. Then go to “CORS” under “Develop” in the sidebar.

Click “Configure” and add * for “Access-Control-Allow-Origins”, * for “Access-Control-Allow-Headers” and POST for “Access-Control-Allow-Methods”. Then save it.

You may wish to be more specific with these options as the wildcard * allows all possible options such as all possible origins accessing this API.

Well that was another big section completed! Now the API is setup to receive predictions from your model. In the next section I will describe how to test the calls with Postman and use AWS CloudWatch to fix any ambiguous errors.

Testing with Postman & CloudWatch

Lets test the inference code of the model with Postman which is an API development platform and helpful for easily checking our new API. Install Postman for your OS if you haven’t already. Then create a new request and select POST as the type. In the URL tab set the URL I mentioned above.

You can change certain things about the request such as authorization or headers, although I did not change any of that from the default.

Next, we are have to actually input the body of the request. This will contain the data we want the model to use to make a prediction. I’ve found that the easiest way to do this is use the “raw” tab under “Body” for both JSON and CSV data. This will avoid any encoding of the data (unless you need that).

For a CSV body, paste the CSV data itself as a raw string. In the format of:

"First,Line\r\nSecond,Line\n\r......"

For a JSON body, write the JSON string. If the actual model input itself is still in CSV format then you can add a key such as “data” with the same value as above. Something like:

{"hyperparameter_1": #, "hyperparameter_2": #, "data": "First,Line\r\nSecond,Line\n\r......"}

Then click “Send”! Chances are something probably didn’t work. Make sure your endpoint is still up from SageMaker (the notebook itself does NOT need to be running). Lambda is connected to the proper endpoint (from the env variables). And API Gateway is connected to Lambda, with the proper URL for the request. If you get a “500 internal server error” or “ModelError” then something is amiss in your Docker image code OR lambda function code. This is where AWS CloudWatch comes in handy.

AWS CloudWatch

The best way to figure out exactly what line of code is triggering an ambiguous error is to use AWS CloudWatch. I’m not gonna go into a lot of detail on CloudWatch but I will mention some helpful things. AWS CloudWatch allows you to monitor running AWS processes that are recorded in logs. For example SageMaker training jobs, notebook instances or endpoints can be tracked.

To figure out what is triggering an error go to AWS CloudWatch and on the left sidebar go to “Log groups” under “Logs”. You should have some log groups created. First check your lambda function logs. Click on it and you will see a bunch of log streams.

The top most one is the most recent so click that and check out the specific messages. If there are issues from this log group then your lambda function is to blame.

The other log group to check is the one attached to your endpoint. Any issue raised here indicates that something is incorrect in your Docker image code. Even if it ran fine locally there can be issues running it in AWS. Also there will be a lot of pings shown here, ignore them or try to filter through them.

Hopefully, its running smoothly now and you get your expected output, if not leave a comment and I will help fix it.

Finally, we have a working and accessible model. You can definitely stop there if all you need is the POST URL. However, another useful thing to do might be to make a nice looking front-end for people to input their own data into the model. An easy way to do this is to use S3 to create a static website to access the POST URL.

Creating a Static Website Front-end with S3

This Medium article does a great job at describing step-by-step how to create a static website with S3. So instead of recreating this myself, check it out and go through all of its steps (Note: you don’t need to set up a CNAME record).

Once you are done with that the only thing I would recommend doing is adding a CORS policy by going to “Permissions” -> “CORS configuration” and pasting this:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedHeader>*</AllowedHeader>
    <AllowedMethod>POST</AllowedMethod>
</CORSRule>
</CORSConfiguration>

which will deal with CORS issues the same way we dealt with it in API Gateway.

AJAX POST Request to Model

Lastly, lets hook up the API call with AJAX in our website. The call is simple and will look something like this:

Note: Set the content type as text/plain to avoid it encoding your request data just like with Postman (unless you want that).

That about wraps up the end-to-end development chain of deploying a custom model with AWS and setting it up with a nice user-facing front-end. If you want to make changes to your model code then re-make the Docker image and push it back to the ECR. Then redeploy. Also, delete the model endpoint when you don’t want it deployed either from the SageMaker console or from the notebook code.

My Application — Anomaly Detection for Google Trends Data

I created a custom model that takes google trends data (CSV) and analyses the plot for anomalies with the algorithm Density-Based Clustering for Applications with Noise (DBSCAN) or Seasonal-Trend decomposition with Loess (STL).

Briefly, DBSCAN is a machine learning clustering algorithm but can be used for anomaly detection given that unlike K-means it does not cluster all points. These leftover or unclustered points can be considered outliers/anomalies.

STL is a statistical decomposition of time-series data for the purpose of detecting anomalies or exploring elements of the data. It decomposes the data into its seasonal pattern, trend and then residual. You can then use tests for outlier detection on the residual such as the Generalized Extreme Studentized Deviate (GESD) test. STL tends to work better than DBSCAN for data with a seasonal pattern, however if there is no seasonal pattern then STL cannot effectively be used.

Check out my deployed model front-end here. Search a term on google trends and download the CSV:

Then upload it on the website and fill in the hyperparameters given if there is seasonality to the data.

Plot it, and it will look something like this:

Plot of Lakers popularity with anomalous points

Note: The website may not be up all the time

Let me know if it's not up and you would like to check it out or if you have any problems with these steps in the comments! And check out the accompanying code here.