Photo by Franck V. on Unsplash

Automating AWS SageMaker notebooks

Learn how to schedule automatic execution of AWS SageMaker notebooks using CloudWatch, Lambda and Lifecycle configurations

Taufeeq Rahmani
Towards Data Science
3 min readJun 26, 2020

--

Introduction

SageMaker provides multiple tools and functionalities to label, build, train and deploy machine learning models at a scale. One of the most popular ones is Notebooks Instances which are used to prepare and process data, write code to train models, deploy models to Amazon SageMaker hosting, and test or validate the models. I was recently working on a project which involved automating a SageMaker notebook.

There are multiple ways to deploy models in Sagemaker using Amazon Glue as described here and here. You can also deploy models using End Point API. What if you are not deploying the models, rather executing the script again and again? SageMaker does not have a direct way to automate this right now. Also, what if you want to shut down the notebook instance as soon as you are done executing the script? This will surely save you money given AWS charges on an-hourly basis for Notebook Instances.

How do we achieve this?

Additional AWS features and services being used

  • Lifecycle Configurations: A lifecycle configuration provides shell scripts that run only when you create the notebook instance or whenever you start one. They can be used to install packages or configure notebook instances.
  • AWS CloudWatch: Amazon CloudWatch is a monitoring and observability service. It can be used to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side and take automated actions.
  • AWS Lambda: AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume — there is no charge when your code is not running.

Broad steps used to automate:

  • Use CloudWatch to trigger the execution which calls a lambda function
  • The lambda function starts the respective notebook instance.
  • As soon as the notebook instance starts, the Lifecycle configuration gets triggered.
  • The Lifecycle configuration executes the script and then shuts down the notebook instance.

Detailed Steps

Lambda Function

We utilize the lambda function to start a notebook instance. Let’s say the lambda function is called ‘test-lambda-function’. Make sure to choose an execution role that has permissions to access both lambda and SageMaker.

Here ‘test-notebook-instance’ is the name of the notebook instance we want to automate.

Cloudwatch

  • Go to Rules > Create rule.
  • Enter the frequency of refresh
  • Choose the lambda function name: ‘test-lambda-function’. This is the same function we created above.

Lifecycle Configuration

We will now create a lifecycle configuration for our ‘test-notebook-instance’. Let us call this lifecycle configuration as ‘test-lifecycle-configuration’.

The code:

Brief explanation of what the code does:

  1. Start a python environment
  2. Execute the jupyter notebook
  3. Download an AWS sample python script containing auto-stop functionality
  4. Wait 1 minute. Could be increased or lowered as per requirement.
  5. Create a cron job to execute the auto-stop python script

After this, we connect the lifecycle configuration to our notebook.

I would love to connect on LinkedIn —https://www.linkedin.com/in/taufeeqrahmani/

Check out the work of my friends at Data Sleek for solutions on all things data.

--

--

MS in Business Analytics at UCLA Anderson School of Management | Analytics | Data Science | BITS Pilani | Ex-Mu Sigma | Ex-Tesco