The world’s leading publication for data science, AI, and ML professionals.

Azure Automated ML Listens to their Designers

Part 3 in this autoML series, Microsoft, the Jedi Master

Source: https://pixabay.com/illustrations/artificial-intelligence-brain-think-3382507/
Image by Gerd Altmann from Pixabay

For the third installment of this Automl series, I continue to test out the ease of use of various autoML tools. Focusing on the basic out of the box offerings and finding free trials, I’m into the world of Microsoft. I sent our "Watson" training file into Microsoft’s autoML offering, Azure Automated ML. Overall, the Azure autoML user experience was much better than both AWS AutoPilot and Google autoML Tables. The experience felt more polished, though it is still in preview mode. You could almost feel that designers had a say in the layout and navigation of the screens. Nice job!

Why Azure Automated ML?

Microsoft is the Jedi Master! Recent rulings regarding the Pentagon cloud computing contracts have put investment and development back on the front burner. This infusion of funds can only mean anything Machine Learning/artificial intelligence is going to get some quality attention. While the ML studio is in ‘preview’ status, it appears to be working without issue.

preview notice - screenshot by the author
preview notice – screenshot by the author

The setup

If you don’t already have an account, you can sign up for an Azure free subscription. This deal will give you a credit of $200 you can use within 30 days.

I followed the setup in this tutorial. While it walks you through the various steps, I found the screens to be pretty intuitive. Just start in the Portal – https://portal.azure.com/#home.

Automated ML is accessible via the MachineLearningServices. Automated ML requires an Enterprise Edition, so choose that option when you come across it.

screenshot by the author
screenshot by the author

Once the service is ready for you, the studio is available. Automated ML options are evident on the home page.

screenshot by the author
screenshot by the author

The Data

To keep parity across the tools in this series, I will stick to the Kaggle training file. Contradictory, My Dear Watson. Detecting contradiction and entailment in the multilingual text using TPUs. In this Getting Started Competition, we’re classifying pairs of sentences (consisting of a premise and a hypothesis) into three categories – entailment, contradiction, or neutral.

6 Columns x 13k+ rows – Stanford NLP documentation

  • id
  • premise
  • hypothesis
  • lang_abv
  • language
  • label

Model training costs

When Google waved the ‘free’ $200, I didn’t research much into the costs of the clusters. At the same time, the pricing wasn’t right in my face as it was with AWS and Google. Curious. I did find a doc, but it wasn’t straightforward.

I looked on the console for Billing! Check your Billing! Nope, it is under the more sinister Cost Management. Training this example knocked 50 cents off my Google Bucks total. Not bad.

screenshot by the author
screenshot by the author

Loading the data

A nice feature with this tool is that you can load the data as part of your current workflow. No need to go outside of the studio to the google storage. No jumping in and out of screens.

screenshot by the author
screenshot by the author

Training your model

Next, I had to configure the training run. I created a new compute cluster using low-priority settings rather than dedicated resources. Using low-priority is generally cheaper. I want to stretch my 200 Google Bucks out as far as I can.

screenshot by the author
screenshot by the author

On the next screen, Classification is checked, and I enabled deep learning. I should have perused the additional configuration settings a bit more than I did because I realized I had a max run time of 24 hours. No thanks.

screenshot by the author
screenshot by the author

There is a status that lets you know where the process is. My favorite status was Generation Dataset Feature. What kind of features?

gif by the author
gif by the author

During the training, I poked around the interface. I was able to see a data profile of the input file.

screenshot by the author
screenshot by the author

Evaluate Training Results

When the models finish training, the leader is picked out and displayed on a dashboard. I was disappointed that the algorithm names were not links to details about those algorithms. They just took me to a similar details page. I’ll have to assume they are [sklearn.preprocessing](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing).MaxAbsScaler and an ensemble of random forests.

screenshot by the author
screenshot by the author

There is a tab called Data Guardrails. The concept is good, and I’m glad to see it included. I would have liked there to be a tooltip on this page for more information. I tried searching for Data Guardrails and had trouble finding details. Even with minimal information, I like it. It will reappear in my fantasy autoML tool.

screenshot by the author
screenshot by the author

‘Hit Go and See What Happens’.

The actual results right out of the box? The models didn’t perform as well as the Google models. I liked the charts provided, and it was clear it barely performed better than chance. I haven’t focused on the actual metric results much as I know as well as the next data scientist that hyperparameter tuning is required, even for autoML. As such, I want to stick to the basics to keep it even across platforms.

screenshot by the author
screenshot by the author

I did like that they also listed the challengers, not just the winning model. The availability of the challengers is vital because, in some cases, you may have to forgo the most accurate model for one that has better explainability.

screenshot by the author
screenshot by the author

Speaking of explainability, I appreciate the thought, but the result is lacking.

screenshot by the author
screenshot by the author

Conclusions

I can see myself using the Azure machine learning studio again. Overall, I was very pleased with the workflow. The experience was seamless, though there are areas where the autoML outputs can be improved. Useful visualizations are essential, especially when we are trying to convince our business sponsors that we have enough to move forward on a project.

The value of great visualizations will be highlighted in my next article on DataRobot. But you pay for the good stuff. This tool will be out of reach for anyone without an Enterprise budget. Same with H2O AutoPilot. But they are beautiful.

In case you missed the first two articles in this series:

Is AWS Sagemaker Studio Autopilot ready for prime-time?

Experience Google autoML Tables for Free


Related Articles