Google Cloud AutoML Vision for Medical Image Classification

Pneumonia Detection using Chest X-Ray Images

The normal chest X-ray (left panel) shows clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (white arrows), whereas viral pneumonia (right) manifests with a more diffuse “interstitial” pattern in both lungs. (Source: Kermany, D. S., Goldbaum M., et al. 2018. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. https://www.cell.com/cell/fulltext/S0092-8674(18)30154-5)

Google Cloud AutoML Vision simplifies the creation of custom vision models for image recognition use-cases. The concepts of neural architecture search and transfer learning are used under the hood to find the best network architecture and the optimal hyperparameter configuration that minimizes the loss function of the model. This article uses Google Cloud AutoML Vision to develop an end-to-end medical image classification model for Pneumonia Detection using Chest X-Ray Images.

Table of Contents

About the Dataset

The dataset contains:

  • 5,232 chest X-ray images from children.
  • 3,883 of those images are samples of bacterial (2,538) and viral (1,345) pneumonia.
  • 1,349 samples are healthy lung X-ray images.

The dataset is hosted on Kaggle and can be accessed at Chest X-Ray Images (Pneumonia).

Part 1: Enable AutoML Cloud Vision on GCP

(1). Go to the cloud console: https://cloud.google.com/

Google Cloud Homepage

(2). Open Cloud AutoML Vision by clicking the triple-dash at the top-left corner of the GCP dashboard. Select Vision under the product section for Artificial Intelligence.

Open AutoML Vision

(3). Select Image Classification under AutoML Vision.

Image Classification under AutoML Vision

(4). Setup Project APIs, permissions and Cloud Storage bucket to store the image files for modeling and other assets.

Setup Project APIs and Permissions

(5). Select your GCP billing project from the drop-down when asked. Now we are ready to create a Dataset for building the custom classification model on AutoML. We will return here after download the raw dataset from Kaggle to Cloud Storage and preparing the data for modeling with AutoML.

In this case, the automatically created bucket is called: gs://ekabasandbox-vcm.

Part 2: Download the Dataset to Google Cloud Storage

(1). Activate the Cloud shell (in red circle) to launch the ephemeral VM instance to stage the dataset download from Kaggle, unzip it and upload to the storage bucket.

Activate Cloud Shell

(2). Install the Kaggle command-line interface. This tool will allow us to download datasets from Kaggle. Run the following code:

sudo pip install kaggle

Note, however, that the Cloud Shell instance is ephemeral and does not persist system-wide changes when the session ends. Also, if a dataset is particularly large, other options exist such as spinning-up a compute VM, downloading the dataset, unzip it and then upload to Cloud Storage. It is possible to design other advanced data pipelines to get data into GCP for analytics/ machine learning.

(4). Download Kaggle API token key that will enable the Kaggle CLI to authenticate/ authorize against Kaggle to download the desired datasets.

  • Login to your Kaggle account.
  • Go to: https://www.kaggle.com/[KAGGLE_USER_NAME]/account
  • Click on: Create New API Token.
Create API Token
  • Download the token to your local machine and upload it to the cloud shell.
  • Move the uploaded .json key to the directory .kaggle . Use the code below:
mv kaggle.json .kaggle/kaggle.json

(5). Download dataset from Kaggle to Google Cloud Storage.

kaggle datasets download paultimothymooney/chest-xray-pneumonia

(6). Unzip the downloaded dataset

unzip chest-xray-pneumonia.zip
unzip chest_xray.zip

(7). Move the dataset from the ephemeral cloud shell instance to the created cloud storage bucket. Insert your bucket name here.

gsutil -m cp -r chest_xray gs://ekabasandbox-vcm/chest_xray/

Part 3: Preparing the Dataset for Modeling

(1). Launch Jupyter Notebooks on the Google Cloud AI Platform.

Notebooks of GCP AI Platform

(2). Create a new Notebook Instance.

Start a new JupyterLab instance

(3). Select an instance name and create.

Choose an instance name and create

(4). Open JupyterLab

Open JupyterLab

(5). Before building a custom image recognition model with AutoML Cloud Vision, the dataset must be prepared in a particular format, they include:

  1. For training, the JPEG, PNG, WEBP, GIF, BMP, TIFF, and ICO image formats are supported with a maximum size of 30mb per image.
  2. For inference, the image formats JPEG, PNG and GIF are supported with each image being of maximum size 1.5mb.
  3. It is best to place each image category into containing sub-folder within an image folder. For example,
    (image-directory) > (image-class-1-sub-dir) — (image-class-n-sub-dir)
  4. Next, create a CSV that points to the paths of the images and their corresponding label. AutoML uses the CSV file to point to the location of the training images and their labels. The CSV file is placed in the same GCS bucket containing the image files. Use the bucket automatically created when AutoML Vision was configured. In our case, this bucket is named gs://ekabasandbox-vcm.

(6). Clone the preprocessing script from Github. Click on the icon, circled in red and labeled (1) and enter the Github URL https://github.com/dvdbisong/automl-medical-image-classification to clone the repo with the preprocessing code.

Clone preprocessing script

(7). Run all the cells in the notebook preprocessing.ipynb to create the CSV file containing the path and labels of the images and upload this file to Cloud Storage. Be sure to change the parameter for the bucket_name.

Run notebook cells

Part 4: Modeling with Cloud AutoML Vision

(1). Click on “New Dataset” from the AutoML Vision Dashboard.

Create New Dataset

(2). Fill-in the dataset name and select the CSV file from the Cloud Storage bucket created by AutoML.

Create Dataset

(3). For now, you may dismiss if you see the error message that duplicated files are located. To the best of my knowledge, this is not the case as per the file names.

Cloud AutoML Processed Images

(4). Click on Train as shown in red in the image above to initiate model building with Cloud AutoML.

Start AutoML Training

(5). Select how the model will be hosted, and the training budget.

Select training parameters

(6). After model training is complete, click on Evaluate to view the performance metric of the model.

Evaluate model performance

(7). Assess the performance metric (precision, recall and confusion matrix).

Evaluation page. Left: Precision and Recall score. Right: Confusion matrix and precision, recall graphs

Part 5: Testing the Model

(1). Click on the Predict tab to test the model.

Predict model

(2). Here’s an example of a test. The image is a chest x-ray scan that was not used to train the model. The medical experts reading this can verify the accuracy of the model prediction.

Model test

The custom image recognition model is also exposed as a REST or Python API for integration into software applications as a prediction service for inference.

Part 6: Conclusion

The article provided a walkthrough to design powerful vision models for custom use-cases by leveraging Google Cloud Platform AutoML Vision. Moreso, the model is hosted on the cloud for inference as a prediction service. This is a powerful mechanism to quickly prototype and build image classification or object detection use-cases before deciding whether to go ahead with more fine-grained and or hand-tuned modeling efforts. With this, I trust it will serve as a template for applying to the vision problems that you care about. Also, be sure to delete your models and datasets when it is no longer needed, this will save cloud costs.

Excerpts of this article are taken from the book “Building Machine Learning and Deep Learning Models on Google Cloud Platform” to be published soon by Apress.