AWS SageMaker

Build, Train and Deploy a ML model on Amazon SageMaker

Vysakh Nair
Towards Data Science

--

Photo by Alex Kulikov on Unsplash

Let’s start with a short and simple introduction to SageMaker and understand what it is that we are working with and later we’ll dive into the ML tutorial!

What is AWS SageMaker?

Amazon SageMaker is a cloud machine-learning platform{just like your jupyter notebook environment :) but on the cloud} that helps users in building, training, tuning and deploying machine learning models in a production ready hosted environment.

Some Benefits of Using AWS SageMaker

  • Highly Scalable
  • Fast Training
  • Maintains Uptime — Process keeps on running without any stoppage.
  • High Data Security

Machine Learning with the SageMaker

The SageMaker comes with a lot of built-in optimized ML algorithms which are widely used for training purposes. Now to build a model, we need data. We can either collect and prepare training data by ourselves or we can choose from the Amazon S3 buckets which are the storage service (kind of like harddrives in your system) inside the AWS SageMaker. Lets see how we can make use of this service to build an end-to-end ML project.

Now Lets Start Building our Model on SageMaker

The main focus of this tutorial will be on working with the SageMaker and the libraries used. There won’t be any explanation for any ML concept.

NOTE: You should have an AWS account for performing these tasks.

1. Create Notebook Instance

Just like you create a jupyter notebook in your system, we will be creating a jupyter notebook on our platform. Below are the steps for doing the same:

  • Sign into the AWS SageMaker Console.
  • Click on Notebook Instances and then choose create notebook instance.
Image by the Author
  • On the next page, name your notebook, keep the instance type and elastic inference as default and select the IAM role for your instance.
Image by Author | From the dropdown, select create a new role and select Any S3 bucket

IAM(Identity and Access Management) Role:

In short, SageMaker and S3 buckets are services provided by the AWS. Our notebook instance need data that we store in the S3 bucket to build the model. A service can’t directly access another service in AWS. Therefore a role should be provided so that the notebook instance can access data from the S3 bucket. You can either give specific S3 buckets or all the S3 buckets for your instance to work with.

  • After creating the role, click on create notebook instance.
  • It takes a couple of minutes for the instance to get created. After that click on jupyter, select the notebook environment that you want to work with.
Image by the Author | I have chosen conda_python3 for this tutorial

There you have it. Your notebook has been created.

2. Understanding Libraries Used

In this session, we will look into all the libraries required to perform the task:

Libraries Used

As I mentioned before, AWS contains a lot of inbuilt ML algorithms which can be used by us. To use those algorithms we need the sagemaker library. All these built-in algorithms are in the form of image containers, therefore get_image_uri helps us to access those containers.

If you are using the sagemaker, you need the boto3 library. Just like you use pandas to read data from your local system, boto3 helps us to access data from the S3 buckets if access to those buckets are provided( remember IAM role?). Now if we want to use sagemaker instance we have to create sessions to do so. Session library is used for creating sessions.

3. Creating S3 Buckets

Image by the Author | Initially there are no buckets

S3 buckets can be created manually or from our notebook instance using boto3. In this tutorial we will be using boto3 to create one.

Code for creating the S3 bucket

In AWS there are multiple regions and each user works in their own region. By default, the bucket is created in the US East (N. Virginia) Region, therefore if your region is other than US-East-1, you have to explicitly specify your region while creating the bucket.

my_region = boto3.session.Session().region_name # this gives you               your region
s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region }) # this is how you explicitly add the location constraint

Bucket names are GLOBALLY unique! AWS will give you the ‘IllegalLocationConstraintException’ error if you collide with an already existing bucket and you’ve specified a region different than the region of the already existing bucket. If you happen to guess the correct region of the existing bucket it will give you the BucketAlreadyExists exception.

Along with that there are some naming conventions which has to be kept in mind while naming them:

  • Bucket names must be between 3 and 63 characters long.
  • Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).
  • Bucket names must begin and end with a letter or number.
  • Bucket names must not be formatted as an IP address (for example, 192.168.5.4)
  • Bucket names can’t begin with xn-- (for buckets created after February 2020).
  • Bucket names must be unique within a partition. A partition is a grouping of Regions. AWS currently has three partitions: aws (Standard Regions), aws-cn (China Regions), and aws-us-gov (AWS GovCloud [US] Regions).
  • Buckets used with Amazon S3 Transfer Acceleration can’t have dots (.) in their names. For more information about transfer acceleration, see Amazon S3 Transfer Acceleration.

4. Loading Data into S3

We will first divide our data into train and test. Then we will load it into S3.

An important step to keep in mind while using SageMaker is that, the in-built algorithms in the SageMaker expects the dependent feature to be the first column of our dataset. So if your dataset’s first column is not that of the dependent feature, make sure that you change it.

Loads both train and test data into the S3 Bucket

s3_input_train and s3_input_test contains the path of the uploaded train and test data in the S3 bucket which will be used later while training.

5. Building and Training the Model

Model

The container retrieves the inbuilt XGB model by specifying the region name. The Estimator handles the end-to-end Amazon SageMaker training and deployment tasks by specifying the algorithm that we want to use under image_uri. The s3_input_train and s3_input_test specifies the location of the train and test data in the S3 bucket. We identified these paths in step 4.

6. Deployment

xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

The trained model can then be deployed using the above line of code. The initial_instance_count specifies the number of instances that should be used to while predicting. More the number of instances, faster the prediction.

7. Prediction

The above code can be used for predicting the result.

8. Clean Up

In this step, you terminate all the resources you used. Terminating resources that are not actively being used reduces costs and is a best practice. Not terminating your resources will result in charges to your account.

# Delete your deployed end points
xgb_predictor.delete_endpoint()
xgb_predictor.delete_model()
# Delete your S3 bucket
bucket_to_delete = boto3.resource('s3').Bucket(bucket_name) bucket_to_delete.objects.all().delete()

Finally delete your SageMaker Notebook: Stop and delete your SageMaker Notebook.

  1. Open the SageMaker Console.
  2. Under Notebooks, choose Notebook instances.
  3. Choose the notebook instance that you created for this tutorial, then choose Actions, Stop. The notebook instance takes up to several minutes to stop. When Status changes to Stopped, move on to the next step.
  4. Choose Actions, then Delete.
  5. Choose Delete.

There you have it! This is how you can build an end-to-end ML model using AWS SageMaker

The entire code for this tutorial can be accessed from my GitHub.

Feel free to connect with me on LinkedIn.

Hope you enjoyed this Tutorial. Thanks for reading :)

--

--