Building Image Detection with Google Cloud AutoML

How to build an image detection app with zero coding knowledge

Published in

Towards Data Science

9 min readJul 12, 2020

Just quite recently, google launched Google AutoML (https://cloud.google.com/vision/overview/docs#automl-vision), which enable us to build image classification or even object detection with no code at all. The feature doesn’t only stop at training, but also for deployment, whether in-cloud or edge computing like android, ios or web browser.

In this post, I will demonstrate how to build image classification with AutoML Vision. Today we are going to build a snack classifier, a machine learning system which able to classify what snack in the image. To make things simple, let’s limit the possible snacks into five options (in the future, we will refer this as label), which is: Good Time, Piattos, Cheetos, Qtela and Richeese Nabati. There is no specific reason why I choose those snacks, its completely random scan while I walk the aisle in the minimarket.

My snacks laying side by side waiting to be eaten. *Image by author.*

Getting Started

Before we begin, three things need to be prepared,

Snacks
Device to capture the snacks images
Google Cloud Account (if you don’t have an account, you are eligible for $300 USD free credit for up to 1 year)

Image Gathering

To build the image classifier in Google Cloud AutoML, we need to prepare at least ten images per label. Since I’m a bit energetic (i think this is coming from sugar in my cafe latte), I gathered a total of 30 images for each label. In order to have good image classifier, it is suggested that you capture a different angle of the object.

In this case, I took the pictures using my phone and transfer them to my laptop. After you have the images in your laptop, we can continue to the next step.

Image Labelling

For this part, until the end of this post, we will move into google cloud console.

First login into your google cloud console, then go to Vision section. You can find this vision by clicking the burger icon (in the top-left beside Google Cloud Platform) → Artificial Intelligence → Vision, or an easier way is to use the search bar in the console.

The first thing that we need to do is Create Dataset. When we create the dataset, there are three options that we can choose :

Single Label Classification
Each image has exactly one label, i.e. mountain/lake/beach.
Multi-Label Classification
Each image has multiple labels, e.g. we want to predict the painter, genre and the emotion from a single painting.
Object Detection
Identify every object in the image, e.g. we want to find every truck/bus/car from a single image.

For our case, every image has exactly one label (either cheetos/qtela/etc.), so we will go with Single Label Classification. After we create a new data set, the next step will be to import the data.

In the import data set page, there are two things that we need to do. First, we need to upload the images, and then set Google Cloud Storage bucket that we are going to save our files.

The image uploading will take a while to finish, especially when we upload a lot of high-quality images. When the upload process is finished, we can see all the image in the IMAGES tab. In the beginning, we haven’t labelled any data, so we can see that there are 150 images in the Unlabelled section. Before we can even assign an image to a label, we need to create the label first, to do this click on ADD NEW LABEL

Right after we add the label, we can assign an image to a label. Select multiple images that belong to a label, then click Assign Labels, select the correct labels, and voila~.

Repeat this process until we finish labelling all the images. I know this is a slow, tedious and painful process, especially when you have a massive amount of images. Please bear with it because having a lot of correctly labelled image is the key to have good image classifier.

Phew! At this stage we have finish labelled all our 150 images to its correct label, the next step is probably the most fun part, which is *drum rolls sound effects*

… building our image classifier

Model Training

A crucial part before we even do the model training is to split our dataset into three parts:

Training Set
Data that we are going to use to train our model.
Validation Set
Data for hyperparameter tuning and finding the best model.
Test Set
Final data to see how good our model to perform in a real-world scenario.

When we label our image, Cloud Vision has automatically assigned the image to train/validation/test. While it is nice to have an automatic feature, as of now, there is no way we can move an image from training to validation or validation to test set.

We can see that on average, we have around ~24 images in training, ~3 in validation and ~3 in Test. While the number is pretty low, this will be enough for our demo purpose.

configure model name, if we haven’t train any model, we will not see the `snack_single_label` model in the left side of the image.

Okay, we finally arrived in the building the actual machine learning model. For the learning step, we will need to do two things. First, we need to set our model_name (in this case I go with snack_single_label) and then decide how are we going to deploy the model, the options are :

Cloud Hosted
The model will be deployed inside google cloud, we can access the model using gcloud UI, gcloud CLI or REST API
Edge
We will be able to download the model and run inside the edge device, such as android, ios or even browser

In this demo, we will deploy inside google cloud, so let’s go with option #1

The unique part about AutoML is google will automatically train different model architecture and parameter and yield us the best-performed model. The only things that we need to configure are how many node hours we are going to afford. The minimum is eight node-hour (which is an hour using eight core machine).

If we are using a new google cloud account, we will have free 40 node hours to train our ML model. Since we still have a free tier, let’s go with minimum node hour.

After we fill in the node hour budget, click START TRAINING and wait for the training to finish. We can find out when the training will finish below the node hour budget text box.

Model Evaluation

The last step before we deploy (or serve) our machine learning model is to find out how good is our model perform in the real world scenario.

We can find out how good is our model by going into the EVALUATE tab. There is some useful metrics such as :

Precision
From all the positive prediction result, how many percent are actually positive?
Recall
From all the actual positive cases, how many percent can our predict as positive?
I try to keep this precision and recall as short as possible, for more detail explanation you can take a look at google machine learning courses here (https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall).
Confusion Matrix
While precision and recall give us percentage, confusion matrix gives us more granular result. The row show us the true label of the image, while the column is showing the prediction result. Let’s take one example, row 3 column 2 means that how many image of cheetos that the model identify as qtela.

If we look at the metrics, we have 100% precision and 100% recall which is brilliant, but since we only have a very small dataset in our test set, it’s hard actually trust the numbers

Model Serving

When we decide that our model is good enough, we can finally deploy the model (yay!), to do this, let’s go to the TEST & USE tab.

Before we can try our model, we need to deploy it first. Please note that because we aren’t not checking the checkbox to deploy model to 1 node after training , we need to deploy it manually by clicking the DEPLOY MODEL

deploy model box after you click DEPLOY MODEL

To deploy, we need to set how many nodes that we are going to use. Cloud vision has told us roughly how many requests/second it can handle for each node, so just need to estimate how big is the traffic goes into this API.

For example in this image, we can see that for each node, it can serve up to 3.2 requests/second, so if our production traffic is around ten requests/second, it is safe to say that we need at least four nodes to handle the traffic.

It will take a few minutes for the model to be deployed, and when it finished, you will see the page below.

use & test page after deployment process is finished

And after all those data preparation, training and deployment, we are finally able to test our image classifier. There are multiple ways to use the model, we can use REST API, or we can use google vision python library, or we can upload a picture in this page.

Let’s upload an image, and leave the REST API and python client for another day :D. I’m going to take a new picture outside our training/validation/test set and hope the model will return the correct prediction.

yay the model correctly classify the image as cheetos

I’ve uploaded the image of me holding a cheetos, and luckily the model correctly classify as cheetos (phew!~)

Conclusion

While Google Vision AutoML is proven to be easy to train and easy to deploy, there is some limitation that you need to consider before using it.

Pricey
All this no-code feature comes with some cost, the minimum price of cloud-hosted training is USD 25.2 per hour (minimum is eight-core per hour with USD 3.15 per core).
For deployment, in minimum, we need to deploy this in 1 node, which is USD 1.25 per hour, which means USD 900 per month.
Completely black box
While we have all the convenient to let the AutoML decide what the best model is, we have no control for the model building and deployment. On top of that, we can not download the model and do model understanding locally.
No autoscaling for cloud deployment
As of now, there is no option to enable the autoscale in cloud deployment. This might be a turnoff to anyone who wants to enable this for a client-facing production environment

So that will be the end of our journey today. For the next topic i will write regarding object detection in Google AutoML Vision, and how we can recreate this snack classifier locally with tensorflow 2.0. I hope you enjoy this post, and feel free to drop me a message if you have any comments or thought