The world’s leading publication for data science, AI, and ML professionals.

Watson AutoAI

Sixth in an autoML series: Big Blue impresses with interactive progress viz

Image by Gerd Altmann from Pixabay
Image by Gerd Altmann from Pixabay

Surprised! Pleasantly surprised, that is in this sixth install of m autoML series. While I used IBM’s Data Science cloud platform a few years ago, I have not had any exposure to Watson Studio before nor their AutoAI offering. I wasn’t sure what I was going to find, but I was pleased with what I saw. For someone who wants (needs) constant feedback of progress during the training runs, I loved the wealth of information provided. The accuracy isn’t as tuned as DataRobot or H2O Driverless AI, but that is to be expected based on the extreme difference in price (tens of thousands of dollars a year).

Why IBM Watson Studio AutoAI?

Big Blue isn’t ‘just’ mainframes and SPSS. I used the IBM data science cloud notebook environment a few years ago. Watson has a storied history at IBM. While you aren’t running ‘on’ Watson, the reputation brings authority.

The Cost

I was able to run this experiment for FREE. The pricing tiers are very reasonable for the individual data scientist.

pricing screenshot by the author
pricing screenshot by the author

The Setup

Try IBM Watson Studio

"These capabilities are available as part of a fully managed starter set of Cloud Pak for Data services on the IBM Cloud. Provision the integrated Lite versions of Watson Studio and Watson Machine Learning for free today as part of Cloud Pak for Data as a Service."

The link above will take you to a Try AutoAI on Watson Studio button. AutoAI is a hosted cloud solution, so the process of setting up a new project is pretty straightforward.

Once you are in Watson Studio, you add an AutoAI asset.

screenshot by the author
screenshot by the author

From there, you can set up a new experiment.

screenshot by the author
screenshot by the author

The Data

To keep parity across the tools in this series, I will stick to the Kaggle training file. Contradictory, My Dear Watson. Detecting contradiction and entailment in the multilingual text using TPUs. In this Getting Started Competition, we’re classifying pairs of sentences (consisting of a premise and a hypothesis) into three categories – entailment, contradiction, or neutral.

6 Columns x 13k+ rows – Stanford NLP documentation

  • id
  • premise
  • hypothesis
  • lang_abv
  • language
  • label

Loading the data

It couldn’t be simpler.

screenshot by the author
screenshot by the author

Training your model

The interface to configure and run your experiment is very minimal. There are some other options under Experiment Settings, but I wanted to run it just as basic as I could. Pick your label and hit Run Experiment.

screenshot by the author
screenshot by the author

This is where it gets fun! There is an interactive visualization that allows you to see where in the process you are with the experiment. I love this! Leaderboards are also available for you to review during the training.

training gif by the author
training gif by the author

Evaluate Training Results

There is a small indicator that the experiment has completed. I would have expected something more eye-catching, but I am happy it finished in a reasonable amount of time, 22 minutes.

screenshot by the author
screenshot by the author

The leaderboards provide information on accuracy and other success metrics as well as the model type and the enhancements make (such as feature engineering). I didn’t see a wide variety of models attempted, thus the short training time.

screenshot by the author
screenshot by the author
screenshot by the author
screenshot by the author

For Pipeline 3, I looked at the engineered features. You have to look into the details because ‘NewFeature_2’ isn’t very descriptive.

screenshot by the author
screenshot by the author

Conclusions

Of interest, after the training, I got a popup that introducing feature engineering on multiple datasets. Definitely, something to be investigated! If they can identify relationships between datasets and create new features, that would be amazing.

Overall, I enjoyed the Watson AutoAI experience itself. The process ran FAST (22 minutes versus H2O Driverless AI’s 4+hours), but at the expense of model variety and accuracy right out of the box. Additional experimental setup would be needed. But for the $78k lower price than DataRobot and H2O Driverless AI, that might be an acceptable trade-off.

I would encourage you to consider trying AutoAI. This IBM offering is a reasonably-priced autoML tool.


If you missed one of the articles in the series, here are the links.

Is AWS Sagemaker Studio Autopilot ready for prime-time?

Experience Google autoML Tables for Free

Azure Automated ML Listens to their Designers

DataRobot makes life easy

H2O Driverless AI


Related Articles