Deploying Models to Production with Mlflow and Amazon Sagemaker

Published in

Towards Data Science

6 min readJul 19, 2019

Update 8/21/19: Changed arguments and some hardcoded variables to work with mlflow 1.2.0

As data science continues to mature in 2019, there is increasing demand for data scientists to move beyond the notebook. matplotlib and some performance metrics are no longer enough to deliver business value. Models need to be deployed and consumable in a scalable manner, and real-time model inference should be both fault tolerant and efficient.

Is Naruto in Sage Mode too nerdy even for this post? Probably…the SageMaker pun is weak as hell too but whatever — Source

Traditionally, model deployment is handled by an engineering team. Data scientists pass models to engineers for code refactoring and deployment. However, lack of standardization and good dev/ops by data scientists creates friction between the two teams, making deployment cumbersome and inefficient. As a response to this trend, the company Databricks (founded by the creators of Apache Spark) have been working on mlflow — an open source machine learning platform for model tracking, evaluation and deployment. See the introductory release post.

Mlflow plays well with managed deployment services like Amazon SageMaker or AzureML. You can even use it to build custom open-source deployment pipelines like this one at Comcast. Given the recent release of mlflow 1.0.0, I wanted to provide some minimalist guidance for data scientists on deploying and managing their own models.

Setup

For this tutorial you need:

An AWS account
Docker installed on your local machine
Python 3.6 with mlflow>=1.0.0 installed

Let’s get started.

Configure Amazon

The first thing you’ll need to do is configure the AWS CLI on your local machine so that you interact with your account programatically. Create an account here if you haven’t already, then run the commands below from a terminal.

pip install awscli --upgrade --user
aws configure

The second command will prompt you for your keys and region, which you can obtain from your account when signed into the console. See the full guide here. You can also create IAM users with more specific and limited permissions if you prefer, just make sure to set the AWS_DEFAULT_PROFILE environment variable to the correct account if you have more than 1 set of credentials.

Lastly, you’ll need a role with access to SageMaker. On AWS, go to the IAM management console and create a new role. Then, attach the “AmazonSageMakerFullAccess” policy to the role, you’ll need this later to interact with SageMaker.

Other Installs

Follow the proper procedure for installing Docker on your OS. Then make sure you start the Docker daemon (check the fun little whale in your menu bar).

Not going to bother explaining how to install Python — we have to be beyond that right? Just make sure you’ve installed the latest mlflow with pip install mlflow. You can also run a remote mlflow server if you’re working with a team, just make sure you specify a location for mlflow to log models to (an S3 bucket). See the server command below:

mlflow server --default-artifact-root s3://bucket --host 0.0.0.0

The service should start on port 5000. Just make sure both the host you started mlflow on and your local machine have write access to the S3 bucket. If you’re just working locally, you don’t need to start mlflow.

Model Tracking with Mlflow

Cool, now we’re ready to dive into the actual modeling. Mlflow let’s you log parameters and metrics which is incredibly convenient for model comparison. Creating an experiment in your code will create a directory called “mlruns” in which all the information from your experiments is stored.

Don’t hate me, but I’m going to use the iris dataset because it’s stupidly easy and the purpose here is to illustrate how mlflow plays with SageMaker. Below, we can see an example of logging our hyperparameters, metrics of model performance, and the actual model itself.

It isn’t immediately clear what’s going on, but if you open up another terminal and type mlflow ui in your current working directory, you can inspect everything we just logged in a convenient application. Mlflow allows you to filter by parameters and metrics, and look at any artifacts you may have logged like models, environments, metadata, etc...

Look at the Mlflow UI (not our models) — Source

When mlflow logs the model, it also generates a conda.yaml file. This is the environment your model needs to run, and it can be heavily customized based on your needs. There is way more you can do with mlflow models, including custom preprocessing and deep learning. The library comes with a variety of model “flavors”, so that you aren’t stuck with sklearn but can also use Pytorch or TF. Check the docs here.

Amazon ECR Image

Now that we’ve saved our model artifact, we need to start thinking about deployment. The first step is to provide a Docker image to Amazon’s Elastic Container Registry which we can use to serve our model. If you’re unfamiliar with Docker check out the documentation.

The mlflow Python library has functions for this part, but I was having some issues with them at time of writing so I used the CLI. Open a new terminal, and type the following into the command line:

mlflow sagemaker build-and-push-container

If you’ve set up AWS correctly with the proper permissions, this will build an image locally and push it to your image registry on AWS. To check that it worked, go to the AWS console and click the “ECR” service listed under compute in the services drop down menu. You should see a single repository called mlflow-pyfunc, and there should be one image listed inside.

Deploying to Sagemaker

Now we only have a limited amount of things left to set up before deploying our model endpoint for consumption. Basically, all we have to do is provide mlflow our image url and desired model and then we can deploy these models to SageMaker.

You’ll need is your AWS ID which you can get from the console or by typing aws sts get-caller-identity --query Account --output text into a terminal. Additionally, you’ll need the ARN for the SageMakerFullAccess role you created when setting up Amazon. Go to the IAM management console, click on the role and copy the ARN. If your models are hosted somewhere other than you’re local system, you’ll also have to edit the model path.

If your AWS credentials are set up properly, this should connect to SageMaker and deploy a model! It just may take a little bit to reach the “InService” state. Once it is, you can programmatically check to see if your model is up and running using the boto3 library or by going to the console. This code was adapted from the Databricks tutorial here.

Output should look something like below:

Application status is: InService
Received response: [2.0]

Now that you’re making calls to your endpoint, you can view stats on usage through the AWS console by going to the SageMaker service. There is much more you can do with SageMaker, but I’ll leave that to the numerous other tutorials available. Once you’re done playing with your new machine learning model, you can delete the endpoint.

mfs.delete(app_name=app_name, region_name=region)

And that’s it! This was a very minimal tutorial, but hopefully this gives you a glimpse into the wide variety of possibilities in terms of production grade machine learning. Remember, never release or upload AWS keys to someplace like Github — someone will steal them and mine a bunch of bitcoin on your dime. Have fun!