The world’s leading publication for data science, AI, and ML professionals.

Exploring SageMaker Canvas

No-code Machine Learning

Image from Unsplash by Sandie Clark
Image from Unsplash by Sandie Clark

Building Machine Learning models takes knowledge, experience, and a lot of time. Sometimes different persona such as Business Analysts or other technocrats who do not have experience with ML might have a ML use-case that they may want to address, but lack the expertise to do so. Even ML engineers and Data Scientists who have ML experience may want a model built quickly.

This brings us to the domain of AutoML. Nowadays we’re seeing a plethora of AutoML solutions from open source APIs to individual services/platforms geared for automating the ML space. SageMaker Canvas completely removes the code aspect from ML, as you’ll see in this article. Everything is done from a visual interface and we’ll explore a basic example on how to work with Canvas in this article.

What Does Canvas Support?

At the moment, Canvas supports the following data types: Categorical, Numeric, Text, and Datetime. This means you can mainly work within the Tabular and Time-Series domain for your ML use-cases, Computer Vision/Image Data is not supported at the moment.

The other key point here is that the only supported file format is csv. Your csv files must come from one of the following external data sources: S3 Bucket, Redshift Database, and Snowflake. You can also give your users permission to upload local files directly to Canvas. For this example we’ll work with our dataset that we’ve uploaded to an S3 Bucket.

SageMaker Canvas Example

To set up SageMaker Canvas you need to create a SageMaker Domain. This is the same process as working with SageMaker Studio. The simplest way of onboarding is using Quick Setup which you can find in the following documentation.

After setup you should see the User that you have just created. Click on "Launch App" and an option for Canvas will open.

Option for Canvas (Screenshot by Author)
Option for Canvas (Screenshot by Author)

This application should launch within a few minutes and you’ll be able to see the UI.

Canvas UI (Screenshot by Author)
Canvas UI (Screenshot by Author)

Now we can get to work on our sample Canvas job. For this example, we’ll take the popular Iris dataset. Note that you may need a large version of this dataset, because a Standard Build with Canvas requires 500 rows and 2 columns at minimum.

Click on the Datasets icon and you can access S3 where you’ve uploaded your dataset to a bucket.

Upload Dataset (Screenshot by Author)
Upload Dataset (Screenshot by Author)

After you’ve uploaded your dataset you should be able to see it in the Datasets tab. A great feature of Canvas is that you can upload multiple datasets for model building. Canvas supports different SQL join statements without having to write any actual SQL code. Check it out in the documentation here if you have a relevant use-case for that scenario.

Uploaded Datasets (Screenshot by Author)
Uploaded Datasets (Screenshot by Author)

We’ll now head over to building a model. For the first step I’ll be selecting my target dataset.

Select Dataset (Screenshot by Author)
Select Dataset (Screenshot by Author)

Using this target dataset we can then head over to the Build tab. This is where we can view different automated visualizations/EDA. We can see different statistics such as missing data, unique values, and feature importance that would generally take pandas, sklearn, or different package code to analyze.

Automated Analysis (Screenshot by Author)
Automated Analysis (Screenshot by Author)

Here we can also drop columns that are not necessarily relevant for our model, such as the ID column. This is then reflected in the Model Recipe that will be used to bake the model.

Dropping Column (Screenshot by Author)
Dropping Column (Screenshot by Author)

We can also get a graphical representation of the distribution of our feature values.

Data Distribution (Screenshot by Author)
Data Distribution (Screenshot by Author)

Lastly, we can also filter our datasets. Say we want to see all values for a specific column that are above a certain threshold. We can adjust this by entering our values into the filter tab.

Filtering Values (Screenshot by Author)
Filtering Values (Screenshot by Author)

Now that we’ve taken time to understand our data we can click on "Model Build". There’s two options here: Quick Build and Standard Build. Quick Build takes 2–15 minutes and this will iterate as fast possible to give you a ready made model. Standard Build will take a few hours, but is a lot more extensive. It also provides you with information such as the different models, metric scores, and training jobs. Since we want this greater visibility we’ll go ahead with a Standard Build, your screen should look like the following once done.

Standard Build Done (Screenshot by Author)
Standard Build Done (Screenshot by Author)

If we click on "Share with SageMaker Studio", we should be able to create a SageMaker Studio link that we can copy and paste to get more information on our Canvas job.

Here we can see the Best Model and Input dataset that we used.

Studio Link (Screenshot by Author)
Studio Link (Screenshot by Author)

If we click on AutoML Job next to our Input Dataset we can see a list of all the models that Canvas worked with under the hood.

Models Canvas Trained (Screenshot by Author)
Models Canvas Trained (Screenshot by Author)

If we click on our Best Model we can see the different parameters such as algorithm/model used.

Algorithm Used For Best Model (Screenshot by Author)
Algorithm Used For Best Model (Screenshot by Author)

We can see here that the best algorithm with our dataset was the SageMaker XGBoost algorithm. We also get further visibility with the different hyperparameters that the model worked best with.

We also get a Performance tab that contains metrics and a confusion matrix for our classification problem.

Confusion Matrix (Screenshot by Author)
Confusion Matrix (Screenshot by Author)

If we want complete visibility we can take a look at the Artifacts tab in the Studio link. This provides resources such as the feature engineering script and model data file for you to unpack and work with on your own if interested.

If we return to the SageMaker Canvas UI we can perform prediction either using Batch Inference or a single data point.

Single Prediction (Screenshot by Author)
Single Prediction (Screenshot by Author)

Additional Resources & Conclusion

SageMaker Canvas is a great place to get started if you want to completely remove code from the picture in ML. I’m excited to see the future capabilities the service brings as it continues to expand as a new launch. In the mean time check out the official AWS Blog on Canvas for more information. If you’re interested in a more flexible AutoML option with SageMaker check out SageMaker JumpStart.

I hope this article was a good primer for SageMaker Canvas. Feel free to leave any feedback and check out my other SageMaker/AWS articles in the following list.


If you enjoyed this article feel free to connect with me on LinkedIn and subscribe to my Medium Newsletter. If you’re new to Medium, sign up using my Membership Referral.


Related Articles