A Machine Learning Approach to Predicting MGMT Methylation Status in Glioblastoma Patients

Radiomics in Oncology

Jarrett Evans
Towards Data Science

--

Photo by National Cancer Institute on Unsplash

Introduction

Today, we’re going to explore a study done for glioblastoma patients that was published in the journal Nature, Scientific Reports: Improving MGMT methylation status prediction of glioblastoma through optimizing radiomics features using genetic algorithm-based machine learning approach. The goal of this study was to try to predict O6-Methylguanine-DNA-methyltransferae (MGMT) methylation status. The reason why it is important to be able to predict this status is because it can be a good indicator of how effective the chemotherapy treatment Temozolomide (TMZ) will be.

Temozolomide Overview

TMZ is an alkylating agent that works by damaging the DNA in cancer cells, which ultimately leads to cell death. TMZ also makes the cells more sensitive to radiation. This is a big factor in cancer treatment as radiation is used to help kill cancer cells.

This study sets out to try to find a new way to predict the MGMT methylation status through machine learning. If successful, this can help alleviate some of the technical limitations and invasive procedures that are currently in place to obtain the tumor specimens needed to test for the status.

It is important that glioblastoma (GBM) cancers are dealt with efficiently and effectively due to the deadliness they pose to the patients who develop them. With a median survival of 14–16 months, they account for about 45% of all malignant central nervous system tumors.

Methods

This team looked to leverage a two-stage approach to predict the appropriate MGMT methylation status. First by eliminating noisy radiomics features and next by implementing classification algorithms into a genetic algorithm to help identify the best prediction features.

There were various machine-learning techniques that were tested during this research. The purpose was to find the most meaningful radiomics features for prediction. They did this by extracting radiomics features from multimodal images from magnetic resonance images (MRI). The two-stage feature selection method started with an eXtreme Gradient Boosting (XGBoost) model followed by a genetic algorithm (GA)-based wrapper model. GA models work in a similar fashion to natural selection, where the ‘fittest’ set of features for predictions are identified.

The data that was used were preprocessed and segmented multimodal MRI features from The Cancer Genome Atlas. In total, there were 53 GBM patients that were included and 704 radiomics features were obtained.

The workflow stage of the genetic algorithm consisted of six different steps: generation of the initial population, fitness assessment, selection of parents, crossover, mutation, and population replacement for the next generation. The formula that was used for selection probability (where the features are selected based on their performance in the fitness assessment stage) is shown below:

Selection Probability Formula

Once the initial features were extracted from the XGBoost algorithm it was time to do classification using the features to predict which patients fell into the classes on MGMT Methylated and Unmethylated and used that as the fitness assessment. They tried implementing three different machine learning algorithms into the fitness assessment piece of the genetic algorithm workflow. They used Random Forest (RF), XGBoost, and Support Vector Machines (SVM). To implement the algorithms, they utilized the Python machine learning library SKlearn.

Results

Once they trained the three different versions of algorithms, they assessed model performance with three different measurements: accuracy, specificity, and recall. To compare the performance, they used the mean of running cross-validation 20 times and used a Kolmogorov-Smirnov test for evaluation.

The best result was achieved by the Random Forest algorithm incorporated into the Genetic Algorithm (GA-RF). This technique outperformed the others in all of the evaluation metrics with an accuracy of 0.925, a sensitivity of 0.894, and a specificity of 0.966.

After the GA-RF model finished there was an optimal subset of 22 radiomics features. Amongst these features included 17 textural features, 3 histogram-based features, 1 volume feature, and 1 intensity feature. Textural features can play a helping role for clinicians by reflecting spatial intensity correlations and distributions of voxels that could help quantify the “multiregional variations” in blood flow, edema, and necrosis. Histogram features can be used to illustrate the frequency distribution of intensity values that occur in an image.

Expanded Use-Case

To test the more widespread applicability of the model the team used the learned features on a new dataset. The dataset they used this time was for patients with Low-Grade Gliomas (LGG). They applied the learned features directly and didn’t do any additional feature selections.

The results for the GA-RF model on the LGG dataset were an accuracy of 0.75, a sensitivity of 0.78, and a specificity of 0.62. Without any transfer learning or fine-tuning done, these are promising results.

By receiving strong performance with the features applied to the LGG dataset the researchers were able to show that these features could potentially be reused for other similar diseases.

Limitations

A potential limitation of this technique is that with a small number of patients, researchers may experience a high-dimensionality-related problem. This occurs when the number of features is high compared to the amount of training data that is available. If this happens it can be challenging to learn accurate relationships between the data and the target variable.

High-dimensionality problems are a widespread issue within the space of radiomics, not just in this study. To overcome this limitation the team used cross-validation to be more certain of the results they were receiving.

Cross-validation Overview

Cross-validation works by splitting the data that are used for training and testing into different groups for each iteration of the validation. By doing this multiple times and taking the mean of the results you have more confidence that your model can repeatedly obtain the results it is giving.

In cancer treatment, clinicians are reliant upon tumor characteristics and grades so that they can optimize treatment for chemotherapy, radiation therapy, and surgery.

Conclusion

If this technique continues to be developed and eventually utilized, it could be a way for doctors to noninvasively understand the MGMT methylation status of their patients. This information can help them to make a more informed treatment decision that could help lead to better prognoses within patients. This also opens the door for other potential ways radiomics can be utilized in Oncology.

Do, D.T., Yang, MR., Lam, L.H.T. et al. Improving MGMT methylation status prediction of glioblastoma through optimizing radiomics features using genetic algorithm-based machine learning approach. Sci Rep 12, 13412 (2022). https://doi.org/10.1038/s41598-022-17707-w

Eberhart, Karin, Ozlem Oral, and Devrim Gozuacik. “Chapter 13 — Induction of Autophagic Cell Death by Anticancer Agents.” ScienceDirect, 2014, https://www.sciencedirect.com/topics/neuroscience/temozolomide#:~:text=Temozolomide%20(TMZ)%20is%20a%20small,damage%20and%20tumor%20cell%20death. Accessed 12 July 2023.

National Cancer Institute. ‘Trials Produce Practice-Changing Results for Brain Cancer.’ Cancer Currents Blog, 9 June 2016, https://www.cancer.gov/news-events/cancer-currents-blog/2016/asco-temozolomide-brain. Accessed 18 July 2023.

American Cancer Society. “How Chemotherapy Drugs Work.” American Cancer Society, 22 Nov. 2019, https://www.cancer.org/treatment/treatments-and-side-effects/treatment-types/chemotherapy/how-chemotherapy-drugs-work.html. Accessed 25 July 2023.

European Society for Medical Oncology. “MGMT Promoter Methylation in Glioma: ESMO Biomarker Factsheet.” Oncology Pro, [updated 2019 Jan 18], https://oncologypro.esmo.org/education-library/factsheets-on-biomarkers/mgmt-promoter-methylation-in-glioma. Accessed 9 July 2023.

Creative Commons license link: https://creativecommons.org/licenses/by/4.0/

--

--