Machine Learning
Acute Lymphoblastic Leukemia (ALL) is the most common pediatric cancer and the most frequent cause of death from cancer before 20 years of age. In the 1960s ALL had a survival rate of only 10%, but advancements in diagnostic testing and refinements to chemotherapies have increased survival rates to 90% in developed countries. [1]

Researchers are attempting a variety of personalized approaches, mainly using epigenetic screenings and genome-wide association studies (GWAS) to identify potential targets for inhibition, to push survival rates even higher. [2, [3](https://www.nature.com/articles/bcj201753)] About 80% of ALL cases are children, but, as Terwilliger and Abdul-Hay note, there is another peak of ALL incidence at 50 years of age and long-term remission rates in the older subset of patients is lower than children, about 30–40%. [3]
ALL is described as the proliferation and differentiation of lymphoid cells in the bone marrow. Important cellular processes, such as the regulation of lymphoid differentiation, cell cycle regulation, growth factor and tumor-suppressor receptor signaling, and epigenetic modification, are perturbed. Additionally, chromosomal translocations are present in about a third of ALL cases. This can cause the overexpression of oncogenes by relocating them to actively transcribed regions or underexpression of tumor-suppressing genes by relocating them to non-transcribed regions of the genome. [[1](https://www.nejm.org/doi/full/10.1056/NEJMra1400972), 3] ALL is commonly polyclonal which further complicates treatment because a number of sub-populations will likely be resistant to any one treatment. [1]
ALL Cell Morphology
ALL can be split into 3 distinct subtypes that makes identification difficult, even for experienced practitioners. L1 are small and homogeneous, round with no clefting, and with no obvious nucleoli or vacuoles. These are the most likely to pass as normal lymphoblasts. L2 are larger and heterogeneous, irregular shape and often clefted, and have defined nucleoli and vacuoles. L3 have the shape of L1 but have prominent nucleoli and vacuoles. [4, 5]

Data Collection
The data consists of 10,000+ images of single-cell microscopy acute lymphoblastic leukemia and normal lymphoblasts with a class imbalance of about 2:1 ALL to normal. Having enough images and computing resources without using all images, I decided to downsample the positive ALL class to manage class imbalance. Thus, training was completed with 4,000 images with a 57:43 class imbalance.
Images are 450×450 RGB images stored as .bmp files, a raster graphics bitmap which stores images as 2D matrices.
Data can be found here. It was sourced from the University of Arkansas for Medical Sciences (UAMS) study on ALL microscopy.
Exploratory Data Analysis
Here are some normal cells from our set. We see spherical, non-clefted cells with homogeneous chromatin and few vacuoles.

Here are some ALL cells from our set. We see irregularly shaped, clefted cells with heterogeneous chromatin and multiple nucleoli and vacuoles.

Average Images
Looking at the average image for each class we see that the interior of the cells have too much variation to identify meaningful differences, but we see clearly that ALL cells are much larger on average than normal cells. This should not be surprising as cancerous cells have unregulated growth.

Modeling
I used this post from Paul Breton and the corresponding GitHub repo for guidance on utilizing Keras with Sagemaker.
I utilized the Keras framework in AWS Sagemaker by specifying neural network architecture and compilation hyperparameters in a separate Python script located in the Model_Scripts directory. Training was accomplished in a ml.m4.xlarge notebook instance allowing for hundreds of epochs in a tractable training time.
I adopted an iterative approach to modeling based on the CRISP-DM process. A dummy classifier predicting the majority class had an accuracy of 57%. I created a Vanilla model with a single Conv2D layer and a single Dense layer which had an accuracy of 68%, already better than the dummy. I then created successively larger and more complex architectures by adding additional Conv2D layers and blocks of layers separated by MaxPooling layers.
The most complex had 9 convolutions in 3 blocks of 3 layers, but this was not the most successful model as it appeared to overfit our training data. It became clear that deep, but narrow blocks were achieving higher metrics than wider blocks. The best model was a 2x2x1x1 architecture with 6 total convolutions. Dropout layers and Batch Normalization were added after MaxPooling and Dense layers to combat overfitting, but were not present in the final model.
I attempted to use recall as a secondary metric for model selection, however this pushed the model to always selecting ALL for images and pushed the model accuracy down to the dummy classifier. I thus decided to drop recall and focus on accuracy. With recall so high and accuracy sacrificed the model would be useless as a tool because all images would have to be reviewed by a human physician anyway, negating the benefits of the model.
Final Network Architecture

Model Compilation Hyperparameters
I used binary cross-entropy for the loss function as this is a binary classification problem, and RMSprop and Adam for optimization. The learning rate was set to 0.001 with a decay of 0.0001.
Model Deployment
The best performing model was deployed using AWS Endpoints in order to load in not-yet-seen images from the testing set in order to generate predictions. The deployment is located in the 003_Modeling_AWS notebook under model training in my GitHub.
Misclassified Images
Here we see an image the model misclassified as Normal when it was actually ALL. The model is apparently responding to the lack of interior vacuoles and clefting as well as a dense chromatin to classify as normal. The irregular shape should have shown that it was ALL.

Here we see an image the model misclassified as ALL when it was actually Normal. The model is likely responding to the irregular cell outline as well as the lighter, heterogeneous areas in the interior that suggest vacuoles or unpacked chromatin to classify it as ALL. This is definitely a challenging cell to correctly sort.

Insights and Recommendations
The model achieved an accuracy of 84%, allowing it to be a useful tool for identifying ALL in novel cases. As blood sample microscopy is already the default diagnostic test for ALL, this model could easily be used to verify a human physician or to flag cases that the model is not confident in for further review. As diagnosing ALL is difficult even for humans, having a robust, accurate verification model could improve the speed and rigor of diagnosis. Due to ALL being an acute leukemia, it is especially vital that it is consistently identified early, left untreated it can kill within a few weeks or months.
Next Steps
Model Improvements
There are several potential avenues for improvement for this model. I attempted to use the Adam optimizer, which adds a sense of momentum and bias-correction to the gradient calculated by RMSprop, and Batch Normalization to improve model performance, though it did help modeling thus far. I could also implement Early Stopping and Model Checkpoints to combat overfitting by allowing the model to stop training once a threshold of overfitting has been reached. I experimented with several levels of Dropout, settling on 25%, but further investigation could yield better results.
Product Improvements
Model interpretability is often as or more important than model accuracy, especially for medical diagnostic needs. It is very important in real-world applications that a doctor can see why the model has reached a certain decision. To that end, building an image segmentation model that identifies and marks important features, such as presence and number of vaculoes, non-spherical cells, or clefted edges within an image could greatly improve the model’s usability. Additionally, deploying the model and allowing for live-integration of new imaging would keep the model up-to-date.
Connect
Here is the GitHub repo for this project. You can connect with me on LinkedIn and Twitter, or visit my website for more articles.
Sources
[1] S.P. Hunger and C.G. Mullighan, Acute Lymphoblastic Leukemia in Children. 2015. NEJM 373 (16): 1541–1552.
[2] C.H. Pui, J.J. Yang, S.P. Hunger, et al., Childhood Acute Lymphoblastic Leukemia: Progress Through Collaboration. 2015. J. Clinical Oncology 33 (27): 2938–2948.
[3] T. Terwilliger and M. Abdul-Hay, Acute lymphoblastic leukemia: a comprehensive review and 2017 update. 2017. Blood Cancer Journal 7: 1–12.
[4] M.M. Amin, S. Kermani, A. Talebi, and M.G. Oghli, Recognition of Acute Lymphoblastic Leukemia Cells in Microscopic Images Using K-Means Clustering and Support Vector Machine Classifier. 2015. Journal of Medical Signals and Sensors 5 (1): 49–58.
[5] F. Scotti, Automatic Morphological Analysis for Acute Leukemia Identification in Peripheral Blood Microscope Images. 2005. Conference on Computational Intelligence for Measurement Systems.