Azure Machine Learning Service — What is the Target Environment?

Part 5: An introduction to AML Environments and Compute Targets

Pankaj Jainani
Towards Data Science

--

Abstract

At this stage, we have the business problem, data from the problem domain, enough experiments, experiment metadata, and the trained model. Now as a developer of MLOps engineer we are still unclear about how to manage the various environments to support model deployment. This post is probably a starter to understand the various execution environment and compute management options available with Azure Machine Learning Service.

This post is extracted from the Kaggle notebook hosted — here. Use the link to setup to execute the experiment.

Photo by Annie Spratt on Unsplash

Introduction

The runtime context for each AML experiment run consists of two elements: The Environment for the script and the Compute Target on which the environment will be deployed and the script run.

The code runs in the virtual environment that defines the runtime and the packages which are installed in the environment using Conda or pip.

Environments are usually created in docker containers that are portable and can be hosted in target compute, i.e., dev computer, virtual machines, container services in the various public cloud.

Environment

Azure Machine Learning service manages environment creation, register, and package installation by defining the Environment. This makes it possible to define consistent, reusable runtime contexts for your experiments.

Creating the Environment

In the experiment which we were running so far, we were running it through the default Coda environment via ‘local’ compute. For production-like scenarios to have better control on the execution environment AML service provides an abstraction to define the custom environment specific to the experiment’s need. This environment definition can repeatedly apply to any execution environment.

The code below defines such an environment by instantiating from the CondadepenDependencies object and then passing conda_packagesand pip_packages required by experiment to run.

Additional Info: There are many other ways to create and manage package in AzureML see this link

from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
iris_env = Environment("iris_trn_environment")
iris_env.python.user_managed_dependencies = False
#iris_env.docker.enabled = True ## If docker ned to be enabled
iris_env.docker.enabled = False
iris_deps = CondaDependencies.create(conda_packages=["scikit-learn","pandas","numpy"], pip_packages=["azureml-defaults",'azureml-dataprep[pandas]'])iris_env.python.conda_dependencies = iris_deps

Register the Environment

Once the environment definition is created we need to register it in the workspace so that it can be reused later.

iris_env.register(workspace=ws)
# carefully notice the json returned
Environment registered response as JSON. Image from my Kaggle Notebook, refer [1]

To fetch the registered environment —

Following is the list of all registered environments including the ‘Default’ environments already available with AMS service workspace.

## Get the list of already registered and custom environments by using -for env_name in Environment.list(workspace=ws):
print("Environment Name",env_name)
List of registered environments in the workspace. Image from my Kaggle Notebook, refer [1]

Compute Targets

AML Compute Targets are the physical or virtual machines on which we run our experiments. In General — Code can be developed and tested on low cost ‘local’ compute. Trained in high-end GPU machines. Deployed and inference on low-cost CPU machines. The main advantage of using Cloud Computing capabilities is: Start/Stop Scale in/out on-demand with cloud capabilities to manage costs by paying only for what you use.

Types of Compute Targets

  • Local — This is used to run the experiment on the same compute target as the code used to initiate the experiment.
  • Training Clusters — for high scalable training requirements — distributed computes, CPU/GPU are enabled and scaled on-demand.
  • Inference Clusters — containerized clusters to deploy the inference of the trained model as an overall application module.
  • Attached Compute — to attach already acquired Azure ML VM or Databricks machine.

Create Compute Targets

One can provision or attach the compute either from the Compute page option in AML studio or using the SDK —

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
cluster_name = 'aml-cluster'
try:
trn_cluster = ComputeTarget(workspace=ws, name=cluster_name)
except ComputeTargetException:
compute_config=AmlCompute.provisioning_configuration(vm_size='STANDARD_D1', max_nodes=4)
trn_cluster=ComputeTarget.create(ws,cluster_name,compute_config)
trn_cluster.wait_for_completion(show_output=True)

The code above provision the Training Cluster -> aml-cluster used for training purposes. Define the size of the VM, max nodes, and registering the same with the workspace.

Provisioned ‘aml-cluster’ for training purposes. Image from my Kaggle Notebook, refer [1]

Training using the Registered Environment and Managed Compute

Finally, it is time to happy our learning to the actual experiment. We will be using a similar experiment as described in my earlier blog posts:

The experiment will take quite a lot longer because a container image must be built with the conda environment, and then the cluster nodes must be started and the image deployed before the script can be run.

from azureml.train.sklearn import SKLearn ## note - directly using SKLearn as estimator, hence avoid using conda_packages
from azureml.core import Experiment, Dataset, Environment
from azureml.widgets import RunDetails
# Get the previously registered Environmnt.
reg_env = Environment.get(ws,'iris_trn_environment')
# Get the previously registered tabular flower dataset
tab_dataset = ws.datasets.get('flower tab ds')
# Create an estimator, look into the 'compute_target' & 'environment_definition' additional parameters and their values
estimator = SKLearn(source_directory=tab_ds_experiment, ## pointing to the correct experiment folder for this context
entry_script='iris_simple_DTexperiment.py',
compute_target=cluster_name, ## Pass the value of cluster created as above
environment_definition = reg_env, ## Pass the value of registered Environment as created already
use_docker=False,
inputs=[tab_dataset.as_named_input('flower_ds')] ## pass the 'tab_dataset' as input to the experiment
)
# Create an experiment
experiment_name = 'iris-compute-experiment'
experiment = Experiment(workspace = ws, name = experiment_name)
# Run the experiment based on the estimator
run = experiment.submit(config=estimator)
# Get Run Details
RunDetails(run).show()
# Wait to complete the experiment. In the Azure Portal we will find the experiment state as preparing --> finished.
run.wait_for_completion(show_output=True)

Below are the snapshots from the successful experiment execution —

Submission of experiment and triggered nodes ready. Image from my Kaggle Notebook, refer [1]
Experiment completed successfully on AML-CLUSTER. Image from my Kaggle Notebook, refer [1]

Conclusion

In this part, we have covered the most beautiful part of our MLOps lifecycle, i.e., to enable deployments of our model/experiments across various environments. The intention was to provide a brief intro about how AML service is capable of handling the environment and compute lifecycle. The extent of this post is thus very limited. I strongly recommend you to look into Microsoft Documentation available for Azure Machine Learning service for new capabilities that are constantly updated.

What Next?

This is a series of blog posts encompassing a detailed overview of various Azure Machine Learning capabilities, the URLs for other posts are as follows:

Connect with me on LinkedIn to discuss further

References

[1] Notebook & Code — Azure Machine Learning — Execution & Compute, Kaggle.
[2] Azure ML Service, What are compute targets in Azure Machine Learning?, Official Documentation, Microsoft Azure.
[3] Azure Machine Learning Service Official Documentation, Microsoft Azure.

--

--

Here to learn about Artificial Intelligence & Cloud Computing | LinkedIn: http://www.linkedin.com/in/p-jainani | Twitter: @pankajjainani