
Let’s say your working with multiple models and want to pick one to invoke based off of the use case of your application. Bring in SageMaker Multi-Model Endpoints as your scalable, cost-efficient solution. With SageMaker Multi-Model Endpoints (MME) you can bring thousands of models to one endpoint and specify which model you want to invoke per your use case. The main constraint with this inference option is that the models all need to be in the same framework, so all TensorFlow or all PyTorch not a mixture of both. If desiring a combination of numerous frameworks you will want to check out SageMaker Multi-Container Endpoints. For this article, we’ll walk through an example in which we bring two custom TensorFlow models for simplicity’s sake. We’ll walk through the end to end example and see how each different model can be invoked or defined by a simple Boto3 API call. Before getting started, please make sure to read the Prerequisites/Setup section as there’s a decent amount of AWS & ML knowledge necessary to fully understand this demonstration. If you would just like to grab the code check out the following link.
Table of Contents (ToC)
- Prerequisites/Setup
- Multi-Model Endpoint Overview
- Example Walkthrough
- Entire Code & Conclusion
Prerequisites/Setup
This article will assume an intermediate level of knowledge with AWS services, particularly those of S3 and ECR as they heavily integrate with SageMaker capabilities. It is also essential to understand the general way SageMaker containers operate and what’s happening under the hood. Luckily for us SageMaker already provides TensorFlow containers that it manages so we can train our models through a simpler functionality known as Script Mode. With Script Mode we can pass in custom models in a training script that we provide to a SageMaker TensorFlow estimator that has a managed container behind the scenes. To follow an end to end example of TensorFlow Script Mode with a custom model check out this article. We’ll be using Script Mode in this example with two different TensorFlow models for our multi-model endpoint setup.
In regards to setup and instance type make sure to create an IAM role with appropriate permissions for S3 and ECR. For instance type, a free tier ml.t3.medium instance should suffice, but for more complicated or computationally intense models check out different compute instances SageMaker offers.
The datasets we will be working with are tabular and regression based. The first is Boston Housing and the second is a Petrol Consumption dataset from Kaggle.
Multi-Model Endpoint Overview
With a Multi-Model Endpoint, there’s still just one container/instance underneath the hood. You train your models with Script Mode, then push the trained model artifacts into a common S3 bucket location. Note that the model data must be in a tar.gz format for SageMaker. You can then populate your endpoint with these different models and specify in your endpoint invocation which model you are working with.
Example Walkthrough
S3 Setup and Imports
Before getting started with any of the training or inference code we need to make sure to have all necessary imports and setup our S3 bucket that we will be using for this example.
Note that our S3 bucket setup is essential as our multi-model endpoint expects all model artifacts in the same S3 location. We’ll use this Bucket with varying prefixes to specify input data, output data, MME location, and more.
Boston Housing Training & Model Creation
The next step involves our first dataset we will be working with in Boston Housing. Using Sklearn we can download the dataset and push it to S3 as we get our data ready for training.
Now that we have our Boston dataset prepared in S3, we can build our training script that we will feed into our TensorFlow estimator. The training script will contain our TensorFlow ANN Model we’ve built along with other hyperparameters we’re passing through our TensorFlow estimator.
We can now pass this script to the TensorFlow estimator which will then fit on the input data that we have prepared to start training.
Training will take a few minutes to complete, but once finished successfully we need to create a model from these training artifacts with a simple call. Before we can do that however, we need to prepare an Inference Script. This script lets us dictate the type of data we are passing in to our endpoint (JSON, JSONline, CSV, etc). The inference file will make it clear to our endpoint what type of data we accept and output, we can use this inference file for both models that we will be creating.
Now we can create our model and pass in this inference script as well.
Petrol Consumption Training & Model Creation
Now we can repeat the same exact process with the Petrol Consumption dataset. I’ll skip the data uploading process as it is the exact same procedure but make sure to upload data properly to S3 or follow the code repository for guidance.
Once again we’ll build a training script tailored for the Petrol dataset this time and pass it in to our estimator and later create a model.
Multi-Model Endpoint Creation
Awesome we have both of our models ready to add to our endpoint. Now to create the endpoint we need to specify the S3 location where both of our models data is lying and make sure it’s in the appropriate tar.gz format as well as the same Bucket.
Using the MultiDataModel estimator we can feed in our model data prefix which contains the model artifacts for both of our models. It does not matter if you pass in model_1 or model_2 for the model information as both operate in the same container. We can make sure both our models are present with the following Boto3 call.

We can then deploy our endpoint similar to how we would with a single-model endpoint.

Multi-Model Endpoint Invocation
Now we can test both of our models with the same endpoint.

Now let’s do the same with the Petrol housing dataset to see both models at work.

Through initial args we can pass in the model that we want to call and we can see the amazing functionality of the multi-model endpoint.
Entire Code & Conclusion
SageMaker-Deployment/Inference/Multi-Model-Endpoint/TensorFlow at master ·…
To access the entire code for the example check out the link above. The repository also contains various other SageMaker inference examples that I have built and compiled that you can use for reference. Multi-Model Endpoints are incredibly powerful and cost efficient as you can load numerous models into one endpoint rather than associating an endpoint with each model. There’s also further examples for SKLearn and PyTorch in the case that you have use-cases with those frameworks.
I hope this article has been useful for people working with Amazon SageMaker. Feel free to leave any feedback in the comments or connect with me on LinkedIn if interested in chatting about ML & AWS. Make sure to follow me on Medium if interested in more of my work. Thank you for reading.