Multi-category classification of various chest conditions from chest x-rays

Using fast.ai and the NIH CXR Dataset to classify x-rays into various chest conditions

Daniel Ching

Published in

Towards Data Science

11 min readDec 23, 2021

Image of Chest X-Rays depicting several conditions. Image by author.

A Brief Introduction

As the need for medical treatments grows exponentially, there has been an unprecedented demand for the quick diagnosis of illnesses. As evinced by the Covid-19 pandemic, testing, diagnostics, and recognition of symptoms are the first line of defence in preventing further spread of the disease. Yet, such diagnostic tools only account for 2% of healthcare expenditure globally, despite it influencing over 70% of all clinical decisions.

According to a report released by the Royal College of Radiologists in 2018, the UK has just “2% of radiology departments [which are] able to fulfill their imaging reporting requirements within contracted hours.” In the US, it is predicted that there will be a shortage of nearly 122 thousand physicians by 2032.

Image by Unsplash by National Cancer Institute

That’s news from developed countries. The disparity between low and high-income countries are huge: there are nearly a 100x more radiographers in the latter than the former. For example, in Africa, infectious diseases account for over 227 million years of health life lost. The lack of quick, accurate diagnosis of such infectious diseases results in scores of lives lost unnecessarily.

What Will Be Covered:

Scoping the problem — recognition of different conditions in medical imaging
Utilising fast.ai to give us a multicategorical classifier
Preparing the data
Training the model
Creating an interactive GUI to test the model against real-world data
Limitations
Learning Points

Recognition of different conditions in medical imaging 👨‍⚕

Damage to the lungs cannot be reversed. However, early detection and treatment of critical chest conditions is essential in preventing any infectious diseases from spreading further. Through identifying telltale conditions (in chest x-rays, for example) indicative of certain diseases, medical professionals with quick, accurate predictions will then be able to triage the patient and execute next steps in a more targeted and efficient manner.

Due to the urgent nature of diagnosing such conditions, coupled with the crippling shortage of manpower (especially in developing nations), this results in numerous lives lost, sometimes unnecessarily.

This is where artificial intelligence — deep learning in particular — can play a crucial role in preventing such lives from being lost.

Multicategory classifier of serious chest conditions — fast.ai

To partially alleviate the lack of diagnostic manpower, we can use a computer vision model to give doctors and physicians a preliminary triage of the patient before further medical help is sought.

The ideal goal to minimise frustration and trouble in operating a computer system was to create an end-to-end application where doctors could get predictions quickly and accurately.

For this, we can utilise the fast.ai library to help us in our quest to classify multicategorical conditions — meaning that given an image with multiple conditions, the model is able to predict the relative likelihood of each condition appearing.

The Data 🔂

We are using a smaller ‘sample’ version of the NIH Chest X-Ray dataset to train our model. The data was downsized to 5606 images, from the original 112,000 images. However, this dataset retains the same structure and relative distribution of images in the various categories as its parent dataset (for example around 54% of x-rays in both datasets have the label “No Finding”). As such, this retains the biases that the original dataset.

To get the dataset from kaggle, we utilise:

This will then proceed to download the zipped file containing the dataset into a Google Colab instance (the main platform that we are using for this project.)

Data Processing

Since the sample was first downloaded in a zip file, we have to unzip it:

! unzip sample.zip

To give a high-level overview of how the data is structured within the folder that was downloaded, we can take a closer look at what was inside the cloud storage:

File structure of images, nested under /sample/images, image by author.

As you can see, the images are nested under /sample/images, and there is also a “sample_labels.csv” where the labels for the different images are stored.

We can then create a dataframe as well as datasets from reading the csv file.

df = pd.read_csv(path/f"sample_labels.csv")
dsets = dblock.datasets(df)

After doing this step, we have to index into the csv file to extract the labels, before matching them to the corresponding images.

We can do this through constructing helper functions which we can utilise in our datablock. The fast.ai documentation provides the list of arguments that the datablock can take:

Our blocks are ImageBlocks (basically telling the model that we are processing images for the independent variable), as well as MultiCategoryBlocks for the dependent variable — telling the model that there can be several categories in which one image is classified. This prevents the model from only identifying one category when it in fact is supposed to detect multiple problems in the chest x-ray.

In this case, we have to define the labels of the independent and dependent variables — the independent variables being the images and the dependent variables being the “Finding Labels”. For the images, we have to get the “Image Index” from the .csv file. For the dependent variables, we have to extract the labels using simple regex.

This is encapsulated in the get_x and get_y functions show below:

We then have to split the data into training and validation sets, using the function RandomSplitter().

We then have to transform all the images to the same size with aug_transforms.

We load the data using .dataloaders(df), where df is the dataframe which contains our values needed from the csv file.

Before training the model, we take one last look at the images to see that they are properly formatted:

Training the model 🚄

Training the multi-categorical model. Image by author

Addressing the concept of thresholds:

After applying the sigmoid to our activations (you can read about what the sigmoid is used for here), we encounter a problem — we cannot simply take the class that has been predicted with the highest activation (since there are different categories that can be assigned to one photo). Instead, we have to decide which labels do not correspond to the images through picking a threshold. Each value above the threshold will be passed (i.e. that label will be attached to the image), whereas each value lower than the threshold will not.

Picking a threshold is essential — picking too low a threshold will affect the overall accuracy of the model, since we will be failing to select correctly labeled objects, meaning that if a label that has rather low activation is registered, it will be incorrectly added to one of the multiple labels on the image. On the other hand, picking too high a threshold means that our model will only attach labels when it is very confident, therefore reducing the accuracy again.

We can use a handy function to pick a threshold by plotting the graph and checking the point which corresponds to the highest accuracy:

We then choose a base learning rate (which can be further optimised using a learning rate finder) and freeze_epochs — which is an example of transfer learning in action — to modify the later layers of the neural network without changing the front few, thereby cutting down on model training time.

This model took approximately 19 to 20 minutes to train, and returned an accuracy of around 92.8%.

Ideally, we would want to simply call learn.predict() to give us the highest likelihood of the diagnosis. As I was testing the model however, I realised that it was very rare for a label to have a corresponding value of more than 0.5 (which seems to be the default value that the label must be equal to in order to appear in the prediction). Therefore, I wrote a couple more lines of code to compile all the predictions matched with their corresponding labels, before displaying it to the user.

This then returns a list of the chest conditions, matched with the relevant probabilities of that occurring.

The user can then sort the predictions, and rank the conditions that the model predicted. He or she can also set an arbitrary value to see which conditions returned a probability more than the set value.

Creating a GUI to test the model against real-world data ✍️

The next step is to test the model against actual real world data to see whether the model was able to detect chest conditions based off chest x-rays obtained from the internet.

In this scenario, I uploaded an image which had been labelled with the condition “Infiltration”. This is done through creating a simple user button where the user can upload files.

btn_upload = widgets.FileUpload()btn_upload

We then call a prediction:

img = PILImage.create(btn_upload.data[-1])list_preds(img)

Let’s see what the model returns:

Since a practitioner would like to see the different probabilities of the different conditions, added one more line of code to sort it based on the model’s confidence.

It returns “No Finding” with the largest probability, followed by “Infiltration” and “Effusion”. Therefore, infiltration was listed as the second prediction, which proves that this model can actually be used to interpret data, with the supervision of an overseeing doctor.

At the end of the day, this tool may be best suited to expedite the process with which radiologists or doctors process diagnostic imaging.

To view my original Google Colab notebook, feel free to check it out here!

Limitations ⛑

Nonetheless, there are still some limitations that I encountered after completing this short project:

While the goal of creating a pleasant UI was not entirely achieved, a graphic user interface was still created where users can upload images of chest x-rays to obtain their prediction. Further steps to be taken could be pushing this model to the web using streamlit or bindr.
Inherent bias of the data (structured such that it had more than a majority of ‘No Findings’) — could possibly have been structured based off what physicians receive, but that just means that the model would be biased to predict that there are “No Findings” for a majority of the images that are predicted
There is a current inability to correctly segment and identify which parts of the chest x-ray represent what conditions, having this function or perhaps drawing bounding boxes to highlight the conditions to the doctor will reduce further time taken to diagnose critical chest conditions.

Learning Points 💡

Through this entire process, I gleaned several nuggets of wisdom.

This process allowed me to catch a glimpse of knowing the various applications of deep learning and computer vision in the field of medicine. The use of artificial intelligence in medicine has amazing potential. According to Mihaela van der Schaar, we are only “scratching the surface of what is possible” when we use artificial intelligence in medicine. We have yet to exploit the full possibilities of AI in making the healthcare ecosystem much more robust and efficient, and I’m so hyped about the breakthroughs that will occur in this field. These are some of the research in the intersection between healthcare and AI that I was interested in:

For a more complete elaboration on how AI can be used to reshape the future of healthcare, check out this deep article by van der Schaar: https://www.vanderschaar-lab.com/revolutionizing-healthcare-an-invitation-to-clinical-professionals-everywhere/

There will undoubtedly be some resistance to the full-scale adoption of AI in healthcare(maybe because of a bad history between doctors and computers), but I’ve come to believe that AI will never be able to fully replace doctors (in the near future, at least).

In my opinion, AI is more of a tool that we can utilise to make more accurate and faster predictions. In Sebastian Thrun’s own words: “The cognitive revolution will allow computers to amplify the capacity of the human mind… Just as machines made human muscles a thousand times stronger, machines will make the human brain a thousand times more powerful.” Therefore, AI will merely augment the profession of doctors, not completely replace it. (Here’s a great article to know the extent to which AI will replace doctors). Maxfield and van der Schaar offer similar sentiments: “There is no viable vision for healthcare that does not have humans i) at its heart and ii) in full control. It is not a case of choosing between one or the other. Humans and AI/ machine learning each have unique strengths that must be combined … rather than applied separately.”

Personally, making the data more readable and interpretable was something that was worthwhile spending time on. The original prediction that was returned by out-of-the-box methods in fast.ai wouldn’t have given the (theoretical) user any idea what he or she was looking at. After exploring UI and UX a little this year, I put that into practice by augmenting the results so that it now ranks the probabilities of the data, matched with the prediction of the condition on its left.

Conclusion

Through this project, I learnt so much — from launching my first multicategorical classification project, to exploring the intersection between medical imaging and artificial intelligence — despite the initial struggles of getting to grips with Python again, I managed to figure it out!

Through gaining deeper insights on the nuances (and honestly, challenges) in utilising AI, I’ve come to realise that deploying an accurate model in the real-world is hardly straightforward — there are so many reasons as to why things can go horribly wrong (or right!). Knowing the use cases, and what the industry needs, still matter the most. This project is just the beginning; I’m so hyped to explore the positive impacts that AI can bring to the world.