
Welcome to the second article in my series of autoML tool user experience reviews. My goal is to compare the east of use and access to key information across several autoML tools. Today, I am focusing on one of the Google autoML tools called Tables. autoML Tables is designed to process semi-structured tabular data (your typical .csv file). Google announced a beta release of autoML Tables on April 10, 2019. Still marked in the Beta phase with features still rolling out, I wanted to check out the Google offering.
Why Google autoML Tables?
I’ve read some good things about the experience with Google Machine Learning. I’ve used Google Cloud before for some consulting work and found the experience to be pretty straightforward. I was looking forward to looking at their machine learning services. I chose Tables because it matches the Kaggle file type I have been using in this series.
As I mentioned before, Tables is in beta mode. As such, I reviewed the releases and issues pages to make sure there weren’t any roadblocks. There were no blockers, though this notice on the issues page made me chuckle:
"User experience with Microsoft Edge and Microsoft Internet Explorer browsers might be suboptimal."
The Setup
The setup to get ready for autoML Tables was more manual than it was for AWS AutoPilot. The before you begin documentation was excellent but seemed to be required before each project you want to start.
- Create a new project
- Make sure your billing is enabled
- Register your application for Cloud Storage, Cloud AutoML API, Google Cloud Storage JSON API, BigQuery API in Google Cloud Platform
- Install the gcloud command-line tool.
- Install Google Cloud SDK
- Create a service account
- Set environmental variables
- Update IAM roles
Not to worry, once you have completed the prereqs, the experience improves immensely.
The Data
As for AWS SageMaker AutoPilot, I used Kaggle competition datasets.
Contradictory, My Dear Watson. Detecting contradiction and entailment in multilingual text using TPUs. In this Getting Started Competition, we’re classifying pairs of sentences (consisting of a premise and a hypothesis) into three categories – entailment, contradiction, or neutral.
6 Columns x 13k+ rows – Stanford NLP documentation
- id
- premise
- hypothesis
- lang_abv
- language
- label
Model training costs
Model training costs $19.32 per hour. Since you do not need any licenses and are only charged for compute resources for training, this is quite reasonable. And for your first models, there is a free trial. You get six free node hours for training and batch prediction. This free trial does NOT include deploying models and then calling them online. Six hours is plenty of time to give it a try. My example used 1.2 hours.
Kicking off the training
Navigate to Google AI. There you will find a menu item "Train with autoML."

Click the Try autoML dropdown to Tables.

Loading the data
Importing the training dataset is simple. Creating a bucket for the output datasets can be done directly within this screen as well.

Once the data is imported, you get access to some basic data analytics.

The details viz is a bit rough. Pies and donuts are for eatin’, not vizzin’.

Training your model
For the training budget, I put in my free six hours. There aren’t that many other parameters that you can access and tune on this screen.

Waiting for the magic
While the UI does provide some updates on the status of the infrastructure spin up and breakdown, there aren’t many indications of how far along you are. I waited for about a half-hour and then decided to trust my budget settings. I did other things until I was notified of the completion. The email is a nice feature.


Evaluate Training Results
Once the email arrives, the Evaluate tab is now populated. There is a chart of some of the standard metrics, such as AUC, Precision, Recall, and Log Loss. I like that a confusion matrix is available.

I am unable to find any details about the model itself other than it is a multi-class classifier. To learn more, I exported the model. What are the details of the model? What are the hyperparameters? Not nearly enough information for my liking.
Scoring
When running in batch, the output must go to BigQuery.

It is at this point where the user experience ends. The output file loads to BigQuery. I like that the data is available for querying or direct use in Data Studio visualizations. I would prefer some analytics readily available right in the autoML console.
Conclusions
I’m a big fan of the documentation and tips provided. They are easy to follow. The email updates are great, as well.
Where the experience falls short is the ease of identifying details about the final model itself. The best you can do is download a Tensorflow model package to evaluate outside of Google Cloud.
The visualizations were disappointing. It is almost better to have no distribution charts at all than some unusable pie charts. As they are still in beta, I hope they are collecting feedback from users on what features they need.
Overall the lack of transparency might prevent me from being able to use this tool on a regular basis. Though, it IS beta…check back soon!