The world’s leading publication for data science, AI, and ML professionals.

Detecting Cancer growth using AI and Computer Vision

AI for Social Good: Applications in Medical Imaging

Introduction

Breast cancer is one of the deadliest forms of cancer in women. As per the World Health Organization (WHO), in 2020 alone, around 2.3 million new cases of invasive breast cancer were diagnosed worldwide that resulted in 685,000 deaths.

Even though developing countries represent one-half of all breast cancer cases, they represent 62% of all deaths caused by breast cancer. Survival of breast cancer for at least 5 years after diagnosis ranges from more than 90% in high-income countries, to 66% in India and 40% in South Africa.

A key step in determining what stage the cancer is in is by microscopic examination of lymph nodes adjacent to the breast to understand whether cancer has metastasised (a medical term meaning spread to other sites in the body). This step is not only sensitive, time intensive and laborious but also requires highly-skilled medical pathologists. It impacts decisions related to treatment, which includes considerations about radiation therapy, chemotherapy, and the potential surgical removal of more lymph nodes

With the advent and advancement of AI and Computer Vision techniques, particularly Convolutional Neural Networks (CNNs), we have been able to improve accuracy on a wide range of computer vision tasks such as image recognition, object detection, and segmentation. These have been beneficial in solving some of the most challenging healthcare problems, especially in regions with limited access to advanced medical facilities.

Building on that, in this article I will present a framework leveraging a state-of-art CNNs and computer vision technologies to aid the detection of metastases in lymph nodes. A successful solution holds great promise to reduce the workload of pathologies, while at the same time reduce the subjectivity in diagnosis.

Methodology and Approach

Given a whole slide image of lymph node sections, our objective is to generate a mask that indicates potential cancerous regions (cells with tumors) in the section. An example is depicted in Figure 2, which shows an image of a tissue on the slide alongside a mask where the yellow region depicts areas in the tissue that are cancerous.

Image segmentation is one of the classic computer vision tasks, where the objective is to train a neural network to output a pixel-wise mask of the image (something similar to the mask in Figure 2). There are several deep-learning techniques available for image segmentation that are elaborately described in this paper. TensorFlow by Google also has a great tutorial that uses an encoder-decoder approach to image segmentation.

Instead of using an encoder-decoder which is commonly used in image segmentation problems, we will treat this as a binary classification problem where each custom defined region on the slide is classified as healthy or tumorous using a neural network. These individual regions of a who slide image can be stitched together to desired mask.

We will use the standard ML process for building the CV model: Data Collection → Preprocessing → Train-Test Split → Model Selection → Fine-tuning & Training → Evaluation

Data Collection and Preprocessing

The dataset is sourced from the CAMELYON16 Challenge which as per the challenge website contains a total of 400 whole-slide images (WSIs) of sentinel lymph nodes collected in Radboud University Medical Center (Nijmegen, the Netherlands), and the University Medical Center Utrecht (Utrecht, the Netherlands)".

Whole-slide images are stored in a multi-resolution pyramid structure and each image file contains multiple downsampled versions of the original image. Each image in the pyramid is stored as a series of tiles to facilitate rapid retrieval of subregions of the image (see Figure 3 for illustration).

More information about Whole Slide Imaging can be found here.

The ground truth for the slides is provided as WSI binary masks indicating the regions in the slides that contain cells with tumors (see figure 2 above as an example).

WSI’s in our dataset have 8 zoom levels that allow us to zoom the images from 1x all the way to 40x. Level 0 is considered the highest resolution (40x) and level 7 is the lowest (1x).

Due to their enormous size (each WSI in our dataset ranges well over 2GB), standard image tools are incapable of reading and compressing them into system RAM. We used the OpenSlide library’s implementation in Python to efficiently read the images in our dataset and also provide an interface to navigate across different zoom levels.

Training a CNN on a whole dataset of 400 WSIs is computationally very expensive (imagine training on 2 x 400 = 800GB dataset). We had access to the free tier of Google Collab which has limited GPU support available. Therefore, we randomly subsampled 22 WSIs from the dataset. At first, a set of 22 images might seem like a tiny dataset to accurately train a Convolutional Neural network but, as I previously mentioned, we extract small patches from each of these enormous WSIs and treat each patch as an independent image that can be used to train our model, as depicted in Figure 5.

At the highest zoom level (level 0 = 40x zoom), each image is approximately 62000 x 54000 pixels – extracting 299 x 299 size patches a would give us about 35,000 individual images from each WSI. We extracted patches from each zoom level. As the zoom level increases, the resolution decreases and the number of patches we can extract from the WSI also decreases. At level 7, we can extract less than 200 patches per image.

Furthermore, every WSI has a lot of empty area where the tissue cells were not present. To maintain data sanity, we avoided patches that had less than 30% of tissue cells, which was calculate programmatically using intensive of the grey area.

The dataset was balanced to have approximately the same number of patches that contained healthy and tumorous cells. An 80–20 train-test split was done on this final dataset.

Model Training

We built multiple CNN models which were trained on the image patches generated using the mechanism described in the previous section.

Objective Function

Our primary optimization objectives were sensitivity and recall, but we also closely monitored the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) to ensure that we were not producing an excessive number of false positives.

In the context of cancer detection, it’s crucial that we minimize the number of false negatives, i.e., instances where the model incorrectly classifies a cancerous sample as non-cancerous. A high number of false negatives could delay the diagnosis and treatment for patients who indeed have cancer. Sensitivity (or recall) measures the proportion of actual positives that are correctly identified, and by optimizing for high recall, we aim to correctly identify as many actual positive cases as possible.

However, focusing on sensitivity alone could lead the model to predict most samples as positive, thereby increasing the number of false positives (cases where a non-cancerous sample is classified as cancerous). This is undesirable as it could lead to unnecessary medical interventions and cause undue anxiety for patients. This is where monitoring the AUC-ROC becomes extremely important.

Model Building

We started-off by building baseline was a very simple architecture that comprised 2 convolutional layers with max pooling and dropout for regularization. To improve over the baseline, we fine-tuned state-of-art image recognition models such as as VGG16 and Inception v3 on our dataset.

As we had images available at different zoom levels, we trained multiple models each of which consumed images from one zoom level to see if viewing images at a particular zoom level enhances performance of the network. Due to a limited number of extracted patches available at lower zoom levels – 3,4,5 images in these zoom levels were combined into a single training set. Separate models were built for images at 0, 1 and 2 zoom levels.

Interestingly, the best performing model was the Inception v3 model pre-trained on ImageNet weights with an additional Global Max Pooling layer (see figure 6). The sigmoid activation function takes any range real number and squashes it into a range between 0 and 1. This is particularly useful in our scenarios, where we’d like to map predictions to probabilities of two classes (0 and 1).

Model Configurations

We did cross-validation to learn the the best hyperparameters for the model. The below shows the final configurations of our augmented ImageNet v3, including the optimizer, learning rate, rho, epochs and batch size used. By using class weights, we enhanced the model’s focus on the minority class (tumorous cases), improving its ability to correctly identify and diagnose cancer cases, an essential requirement in this critical health context.

Model Evaluation

We looked at the loss, AUC and Recall for training runs with different hyperparameters and sampled image patches at different zoom levels.

As aforementioned, images at 3,4,5 zoom levels were combined into a single training set and separate models were built for images at 0, 1 and 2 zoom levels. Below charts show the performance for different zoom levels on the validations set. The performance was best at zoom level 1 in terms of the AUC and recall, on the modified Imagenet v3.

Inference

Once the model has been fine-tuned, we can use it to generate ‘masks’ for any new whole-slide image. To do this, we would first need to generate 299 x 299 resolution (the input size for standard Imagenet v3 architecture) patches from the image at the zoom level that we are interested in (either level 1 or level 2).

The individual images are then passed through the fine-tuned model to classify each of them as containing tumorous or non-tumorous cells. The images are then stitched together to generate the mask.

Here are the outputs and the actual masks for two whole-slide images in our test set. As you can see, the masks output by our model decently resembles that actual mask.

Concluding remarks

In this post, we explored how computer vision models can be fine-tune to detect cancer metastases on gigapixel pathology images. The below image summarizes the workflow for training of the model and the inference process to classify new images.

This model integrated in existing workflow of pathologists can act an assistive tool, and can be of high clinical relevance especially in organizations with limited resource capabilities, and can also be used as the first line of defence to diagnose the underlying disease in a timely manner.

Further work needs to be done to assess the impact on real clinical workflows and patient outcomes. Nonetheless, we maintain a positive outlook that meticulously verified deep learning technologies, alongside thoughtfully crafted clinical instruments, have the potential to enhance the precision and accessibility of pathological diagnoses globally.

_Do check out the source code on my Github: https://github.com/saranggupta94/detecting_cancer_metastasis._

You can find the final results to the CAMELYON competition here: https://jamanetwork.com/journals/jama/article-abstract/2665774


If you are interested in collaborating on a project or would like to connect, feel free to connect with me on LinkedIn or drop me a message at [email protected].

Thanks to Niti Jain for her contributions to this article.


Related Articles