Humans Have Struggled to Understand Rare Cancers, Let’s Give Artificial Intelligence a Turn.

The Challenges of Cancer Subtyping & How Artificial Intelligence Can Help

Dr. Hassan Muhammad
Towards Data Science
6 min readMar 21, 2019

--

There is a certain level of loneliness that comes with being diagnosed with a rare cancer like intrahepatic cholangiocarcinoma (ICC). Most people wouldn't even know where that cancer occurs. It is a cancer of the bile duct, a group of tube-like structures that extend out of the liver, and has an incidence of 1 in 140,000 people in the United States.

Because ICC isn't well-studied due to its rarity, doctors can’t really refer to previous cases to predict the outcome of a new patient. Survival times can range from a few short months to many long years and there is no way of knowing which survival group a patient will fall into — there is no known genetic or cellular marker to predict this. Such survival markers are important to have, not only for the patient to personally come to terms with the future of their life by knowing their likely outcome, but also for their doctor to decide treatments.

“Subtyping” ICC, or finding subgroups within the cancer as a whole, can help alleviate this issue, as I will further explain in the next section. However, this is a difficult endeavor. Researchers can spend up to decades doing manual labor to subtype a cancer but still may fail due to there being not enough cases to make strong scientific conclusions. With the recent boom in artificial intelligence (AI), we now have power to automate repetitive processes, understand complex visual data, and extract valuable information from even small datasets. AI offers new tools to finally understand rare cancers with more depth, and gives patients some light in the darkness of the unknown.

What does it mean to subtype a cancer?

One of the greatest difficulties in fighting cancer stems from the sheer diversity that the disease can exhibit across a population. Because of the genetic variation across tumors, two patients with the same type of cancer can have significantly different survival outcomes and treatment protocols.

To facilitate treatment decisions, clinicians divide each cancer into subgroups based on cellular, and sometimes, genetic profiles — this is referred to as subtyping a cancer. In a well-subtyped cancer, all patients in each subgroup should have similar disease severity and survival outcomes, which are different when compared to patients from a different subgroup of the same cancer.

Being able to compare and group patients together by severity of disease and differences in tumor biology is important for determining appropriate treatments and predicting survival outcomes for new patients. This task primarily lies in the hands of a pathologist, a clinician who is trained to look at tissue under a microscope to describe the characteristics of a disease.

When a patient is diagnosed with cancer, it is common procedure to have a biopsy taken of the cancerous mass to understand its cellular and structural makeup. This tissue will be prepared for visualization on microscope slide to better highlight cellular morphology.

For example, tissue samples from prostate tumors have five major patterns mainly defined by how “differentiated” the cancer cells are, or more simply, how well they resemble normal cells. Well differentiated cells grow slowly and indicate an early, less aggressive stage of the disease. Poorly differentiated cells indicate more cell death, or a more aggressive cancer, and poorer survival outcomes. These five categorical patterns are used to create a Gleason Score, an internationally recognized disease staging tool for prostate cancer.

Survival probability of prostate cancer patients over 15 years. Those with a higher Gleason Score (G) have a lower probability of survival as time progresses. [1]

For prostate cancer, subtypes relate directly to cell differentiation. The same rule doesn't necessarily apply to other kinds of cancer. For example, in certain skin cancers, one of the characteristic features is elongation of cells.

How are subtypes discovered?

Prostate cancer has one of the most well-developed subtyping protocols. Prostate cancer subtyping was originally developed by Dr. Donald Gleason, chief of pathology at Minneapolis VA Medical Center, over the course twenty years starting in the 1960s. In an effort to standardize care for patients suffering prostate cancer, he personally observed thousands of tissue samples and recorded recurring patterns to be compared to their known survival outcomes. Through multiple revisions and many years of validation, we have prostate subtyping as we know it today.

Subtyping a cancer by arranging patients into groups based on their pathology is a challenging endeavor and extremely time consuming. Looking at one microscope slide is like looking at all of New York City on Google Maps where each NYC street represents a group of cells. Imagine looking at thousands like Gleason! Further, one pathologist may see something different than another pathologist — human bias is another factor to overcome.

Rare cancers face another challenge. Unlike prostate cancer medicine, which has the luxury of a well-defined grading system due to the unfortunately high number of occurrences, rare cancers do not. It is often the case that one research institution does not have a large enough dataset of samples to do a comprehensive analysis of all occurring patterns in the tissue.

How can computational pathology help?

Computational pathology and artificial intelligence-based models offer answers to the challenges of subtyping cancers using traditional approaches. While it may take years for a human to assess thousands of slides, a state-of-the-art graphics processing unit (GPU) can do it in less than a week. Moreover, AI models have the potential to extract more information from small datasets, opening up possibilities for developing subtypes for rare cancer for the first time.

Intrahepatic cholangiocarcinoma (ICC) is a rare primary liver cancer of the bile duct. Because of its rarity, there are no established subtypes for this cancer, making treatment decisions difficult.

Our computational pathology group at Memorial Sloan-Kettering Cancer Center recently published a pre-print examining 246 digitized slides of ICC and building an AI-based model to search for potential subtypes within the dataset [2]. Although the number of cases seems small, it is the largest dataset of ICC in the world.

Our new algorithm works similarly to a human pathologist. Using methods similar to how self-driving cars can identity different features on the road, our algorithm first looks throughout all of the digitized slides and finds visually similar patterns that occur throughout all of the tissue across all patients. Next, we use statistical analysis to compare each of these patterns to known survival outcomes for each patient. For example, the analysis will assess how many patients are at a high risk of recurrence if their tissue contains each pattern, if each pattern is assumed to be an indicator of a subtype.

In this case, artificial intelligence can help aid pathologists, who are overworked and in short supply, while contributing to medical research in a positive way. In the future, until cancer no longer exists on this earth, people who face the unfortunate diagnosis of ICC and other rare cancers will have a better understanding of what is to come.

Read about the study in detail here.

Hassan Muhammad is a PhD Candidate at Memorial Sloan-Kettering Cancer Center and Weill Cornell Medicine. He is a member of the Thomas Fuchs Lab, a research group dedicated to developing AI models to improve medical image analysis and cancer research. This work is supported by a Cycle for Survival grant awarded to Dr. Amber Simpson.

References

  1. Egevad L, et al. “Prognostic value of the Gleason score in prostate cancer.” BJU international 89.6 (2002): 538–542.
  2. Muhammad H, et al. “Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary.” arXiv preprint arXiv:1903.05257 (2019).

--

--