Speaking of SageMaker

Training Machine Learning Models on Amazon SageMaker

Ephemeral clusters, experiments, visualization and more

Emily Webber

Published in

Towards Data Science

8 min readMay 11, 2020

Deep learning visualization with SageMaker Debugger — Visualize deep learning models with SageMaker Experiments

It’s midnight. You’ve spent hours fine-tuning your script, and you’re racing to get it onto the server before your deadline tomorrow. You’re building Naive Bayes, Logistic Regression, XGBoost, KNN, and any model under the sun in your massive for-loop.You’ve finally ironed out the kinks on your local machine, and you’re ready to scale your precious script, but when it starts to run you see … what exactly? Random print statements? How do you know it’s working? What would you do if it broke? How do you even know your models are doing what you want them to?

The reality is that you don’t need to go it alone. There are tens of thousands of other data scientists and machine learning engineers who have walked the same path as you, and fortunately, there’s quite a bit of technology on the table you can put to work in pursuit of your goals.

Here I’ll walk you through training machine learning models on Amazon SageMaker, a fully-managed solution for building, training, and deploying machine learning models, lovingly developed by Amazon Web Services. We’ll cover how to bring your own model on SageMaker, analyze training jobs with the debugger, manage projects with experiments, and scale jobs over multiple EC2 instances.

Let’s jump in! And who knows, by the end of this tutorial, you may even get your own slick visualization to show for it.

Set up Amazon SageMaker Studio

Studio Decouples Development from Compute

First, let’s get your environment setup. SageMaker Studio is a fully-integrated IDE for machine learning. It decouples development from compute, letting you easily modify and configure your EC2 instances separately while maintaining your IDE. You can setup Studio with either IAM or SSO credentials, with more details right here.

Clone the repository

Next, clone the Github repository for SageMaker Examples. Open up Studio, create a new terminal, and run this command.

git clone https://github.com/awslabs/amazon-sagemaker-examples.git

Next, navigate to this directory: amazon-sagemaker-examples/sagemaker-debugger/mnist_tensor_plot

Then open up the notebook!

Visualizing deep learning models with SageMaker Debugger

Let’s step through this together. Right off the bat, make sure you’re adding a few dependencies.

! python -m pip install plotly
! python -m pip install smdebug
!pip install sagemaker
!pip install awscli
!pip install nbformat==4.2.0

Once you’ve gotten those installed, we’re ready to get your job started!

These first lines are pretty generic, you’ll see most of them across the SageMaker-Examples. We’re importing the SageMaker Python SDK, then pointing to the new SageMaker-Debugger library. This has both a Debugger Hook Config, and a Collection Config. We’re going to need both of these here.

Next, let’s set up our estimator!

Estimators are how we configure training jobs on SageMaker. You’ve got your execution role with the permissions to run the job, then the EC2 instance config. See how minimal that is? It’s just 2 lines of code to specify that you need an ml.m4.xlarge.

Next, we’re pointing to our entry point script. This is a file that comes with your example; it’s using the MXNet estimator to complete abstract away the Docker file.

Remember, as long as you can run your code on Docker, you can run it on SageMaker.

Here, we’re using what’s called script mode, or the ability to bring your own model script and scale it up on SageMaker using ephemeral clusters. These are EC2 instances that spin up when your job starts, and spin down when your job is over. This makes them a lot easier to scale, secure, and pay for.

After defining the model, along with the version of the framework, we’re ready to add the Debugger Hook Config. This tells SageMaker that (A) we want to add Debugger to our job, and (B) what we want it to do. In this case, we’re grabbing all tensors.

SageMaker Debugger Demystified

Don’t be misled by the name, SageMaker Debugger is a very advanced solution! The net is, you’re going to collect the tensors in your deep learning model. Then, you’re going to analyze those. SageMaker Debugger comes with 18 rules out of the box , and you can apply these to your deep learning models with zero code changes. That is to say, as long as you are using the SageMaker deep learning containers, you don’t need to modify your script to start using SageMaker Debugger.

Launch Your Training Job

Now, fit your model, and we’re off to the races!

Creating a new training job on Amazon SageMaker

If you’re a savvy AWS user, you’ll know to navigate back to the AWS console to investigate your training job. Navigate to Training on the left-hand side of the SageMaker landing page, then select Training Jobs.

This will open up a view into your job! You’ll see it’s status show starting, along with all the details about where your data is, which image you’re using, where your model artifact is going, and the date and time.

Remember, all of your jobs in SageMaker are stored and logged by default.

This means it should be breathtakingly easy not just to monitor your jobs while they’re running, but revert back and pick up a model exactly where you left off, even months back.

Now let’s go get those tensors.

Copy the tensors from your model to your local Studio instance

Next, run a quick command to grab the path in S3 where your tensor data is stored.

Next, run the command to copy your data from S3 to your local Studio instance.

I have to admit, these 2 lines are quite possibly my favorite commands on the AWS cloud. They’re so simple, and yet so effective. !aws s3 cp and !aws s3 sync All we’re doing is moving our data from S3 to our code on Studio. And it works like a charm.

Visualize your Deep Learning Model

Now, here comes the fun part. Let’s use a package calledtensor_plot to setup an interactive visualization of your network!

We’re pointing to a folder called debug-output, you’ll want to make sure that’s where you copied those tensors from S3 into.

Create an interactive visual for any deep learning model with SageMaker Debugger

Pretty awesome! Remember, you can set up this visualization for any deep learning model you train on SageMaker. You just need to add that debug hook config. You can also run this on XGBoost models!

Manage Projects with SageMaker Experiments

Next, we’ll learn how to manage our projects using SageMaker Experiments.

I’ll walk you through a few code snippets from this notebook, available here.

amazon-sagemaker-examples/sagemaker-experiments/mnist-handwritten-digits-classification-experiment.ipynb

Manage machine learning projects with SageMaker Experiments

First, it’s helpful to know how this is organized in SageMaker. You have experiments, trials, and trial components. The experiment is the overarching learning project you’re working on, like your computer vision solution, or a forecasting model you’re building.

That experiment is going to be broken down into multiple trials, and each trial will roughly correspond to a training job. So if you’re testing out XGBoost, KNN, Logistic Regression, and SVM for a classification project, you would list each attempt as a trial. And of course, you can specify the object metrics that you’re interested in for those trials.

Next, each trial will have a trial component. These are going to be the steps of that trial, such as preprocessing techniques you applied, SageMaker Processing jobs you ran, or training jobs.

This is how you can traceback your results. After you’re found a reasonable model, as long as its been tracked via Experiments, you can literally follow the lineage of that model to recreate it downstream.

Here are a few code snippets, along with the slick experiments visuals!

Install dependencies for SageMaker Experiments

First, just make sure to get the dependencies installed.

Next, get your imports set up, along with your SageMaker credentials.

Now, let’s create an experiment via the SDK! A few things to note. (1) This is literally only 4 lines of code. (2) You get one chance at naming. You’ll want to develop a naming version strategy if, like me, you take a few tries to get the one you really want.

version_num = ‘v1’my_experiment = Experiment.create(
 experiment_name=”my-model-{}”.format(version_num), 
 description=”My first experiment”, 
 sagemaker_boto_client=sm)

Next, let’s add preprocessing steps to your experiment. Here, we’re going to log parameters that we set. In this case, that’s the mean and standard deviation we’ll use for normalization.

Note that here you can log parameters you’re setting. When running SageMaker Autopilot jobs, you can see how to log SageMaker Processing jobs to associate your feature engineering steps with model training.

Log parameters with SageMaker Experiments

Follow the notebook to create the estimator. In that example, you’re actually going to loop through a list of the number of potential hidden layers to build your network around.

Once you’ve got the estimator created, here is how we’ll add an experiment config to it.

estimator.fit(
 inputs={‘training’: inputs}, 
 job_name=cnn_training_job_name,
 experiment_config={
 “TrialName”: cnn_trial.trial_name,
 “TrialComponentDisplayName”: “Training”,
 }

On calling estimator.fit(), you’re adding the rest of the components you need to associate this training job with your experiment, i.e. logging it as a trial.

Now, click on the Experiments tab on the left, then right-click on your experiment to view in the trial components detailed view.

SageMaker Experiments Provides Visuals for Trial Components

Pretty fancy! Remember, every time you create an experiment in Studio, including with the SDK, it’s going to show up in the Experiments Tab. To view the trial component list, just right-click.

Next, highlight one of the trials, and right-click to open in trial details.

This will give you the training job-specific drill down.

Once you’re in the trial details view, you can inspect the lineage of your project by immediately referencing both the training job, and the preprocessing steps.

On top of that, you can create charts to analyze your model performance in that specific trial.

Visualize results with SageMaker Experiments

And that’s a wrap! I hope you enjoyed this tutorial. Remember, you can always dig into more examples on our Github page right here, or the rest of our developer resources right here.