Step-by-step guide to scaled ML environment provisioning on AWS
Joint post with Nivas Durairaj
The Gutenberg printing press was revolutionary in its time. Suddenly, publishers could print thousands of book pages per day when compared to a few handwritten pages. It enabled a rapid dissemination of knowledge in Europe and opened the era of Renaissance.

Today, large enterprises need to deliver hundreds of ML projects to their business, while doing so in a secure and governed manner. To accelerate ML delivery, they need to provision ML environments with guardrails in minutes. A printing press allowing ML teams to quickly access working environments, and scaffolding to operationalize their solutions.
I have recently published a guide on how to do that on AWS. Here we will put this in practice. I will share in 3 steps how you can modernize your cloud operations to scale ML delivery.

Walkthrough overview
We will tackle this in 3 steps:
- We will first setup our ML platform foundations with AWS Control Tower and AWS Organizations. We will adopt a multi-account strategy and each ML project will operate in a separate account.
- Then, we will enable self-serving of templated ML environments with AWS Service Catalog and Amazon SageMaker. It will allow ML teams to self-provision approved environments in their accounts in minutes.
- Finally, we will see how ML teams can launch and access their governed ML environments.
Prerequisites
To follow this post, make sure you:
- Visit Using AWS Control Tower to Govern Multi-Account AWS Environments if this sounds new to you.
- We will apply concepts presented in Setting up secure, well-governed machine learning environments on AWS. Make sure you read the post before continuing.
- For self-serving, we will reuse the same approach and Service Catalog portfolio as in Enabling self-service provisioning of Amazon SageMaker Studio resources. Make sure you are familiar with it.
Step 1: Enabling ML projects with modern cloud operations
First, we want ML teams to access a secure and compliant AWS accounts every time they have a new project. Here, we keep it simple and create 1 AWS account per project.

Setting up your Landing Zone and creating a Workloads OU
Navigate to the AWS Control Tower console to setup your landing zone. See Getting started with AWS Control Tower for details on how to launch one.

Once launched, it should take about half an hour for the process to finish.

Note – The log archive account under the Security OU can act as a consolidation point for log data gathered from all the accounts under Workloads OU. It can be used by your security, operations, audit, and compliance teams to support regulatory requests.
Applying Control Tower guardrails and SCPs for ongoing governance
You can setup Control Tower Guardrails to provide ongoing Governance for your overall AWS environment, and Service Control Policies to control maximum available permissions for all accounts under the Workloads OU.
For illustrative purposes, we will use the same example SCP as in this blog. It prevents ML teams from launching SageMaker resources in their accounts, unless a VPC subnet is specified:


Creating project accounts with the account factory
You can now create new accounts on demand, using the Control Tower Account Factory.

The account factory is a Service Catalog product so you can create accounts through the UI. When scaling, you may create them programmatically.
Managing user authentication and permissions
Next you need to manage user authentication and permissions in the accounts. Here I use AWS SSO to manage those and you can follow the process in this video. Feel free to use the identity provider of your choice:


Sagemaker provides service-specific resources, actions, and condition context keys you can add to the permission sets. Also see this page for managing permissions to other AWS services.
You can create permission sets for different ML project personas. Here are a few examples:
- Data scientist They can experiment with ML approaches.
- ML engineer They can handle CI/CD, model monitoring, and artifacts.
- ML platform engineer (admin) Administrator.
- Audit and compliance team They have read access to Log Archive account, where they can verify the compliance of workloads.

You should now have multi-account foundations in your ML platform. When a new ML project starts, you can create a new AWS account with guardrails, and provide users with access to it. The process takes only a few minutes.
Step 2: Self-serving templated ML environments
Now that your ML teams have access to accounts within minutes, they need to access working environments, and scaffolding to operationalize their solutions.
We will create a Service Catalog portfolio in the Control Tower management account, and share it with the ML projects accounts.


Creating a Service Catalog portfolio in the management account
For this you can reuse the approach from Enabling self-service provisioning of Amazon SageMaker Studio resources. It will allow you to automate the deployment of SageMaker products using the AWS Service Catalog Factory.

Note For illustrative purposes, I put the Service Catalog Factory in the Control Tower management account. In real life, your ML platform team may have dedicated accounts to build, test, and deploy products and portfolios.
Sharing the portfolio with the ML project accounts
Now we will share the Service Catalog portfolio with all account under the Workloads OU.

The process is very easy and you can follow the steps in this video:

As an ML platform admin, you can log into the project accounts and accept the portfolio.


Then, you can provide ML teams with access to the imported portfolio in their account.

From now on, creating new ML project accounts should take no more than a few minutes. Updates you do on the Service Catalog portfolio will be reflected automatically in project accounts, allowing you to continuously deploy new products.
Step 3: Launching an ML environment in a project account
Now is the easy part! We will use one of our SSO user, log in into the MLProjectA account, and launch SageMaker Studio.



Note – For illustrative purposes, the example Studio domain will look for the default VPC with public subnet. I get one with the aws ec2 create-default-vpc CLI command. In the real world you will want the Studio domain to run in a private subnet.

Conclusion
A multi-account strategy and self-service provisioning of governed ML environments allows enterprises to scale their ML delivery. It allows ML teams to start working in approved environments, in a few minutes.
In this post, I have shared how an ML platform team can quickly provision secure, well-governed ML environments with AWS Control Tower, AWS Organization, AWS Service Catalog, and Amazon SageMaker.
To go further, you can visit the AWS Management and Governance Lens and Industrializing an ML platform with Amazon SageMaker Studio.