Pneumonia detection from chest radiograph using deep learning

Huseyn Gasimov
Towards Data Science
6 min readNov 25, 2019

--

Pneumonia accounts for over 15% of all deaths of children under 5 years old internationally. In 2015, 920,000 children under the age of 5 died from the disease. While common, accurately diagnosing pneumonia is difficult. It requires review of a chest radiograph (CXR) by highly trained specialists and confirmation through clinical history, vital signs and laboratory exams.

Chest Radiographs Basics

In the process of taking an image, an X-ray passes through the body and reaches a detector on the other side. Tissues with sparse material, such as lungs, which are full of air, do not absorb X-rays and appear black in the image. Dense tissues such as bones absorb X-rays and appear white in the image. In short -

* Black = Air

* White = Bone

* Grey = Tissue or Fluid

The left side of the subject is on the right side of the screen by convention. You can also see the small L at the top of the right corner. We see the lungs as black in a normal image, but they have different projections on them — mainly the rib cage bones, main airways, blood vessels and the heart.

An example chest radiograph looks like this:

Pneumonia usually manifests as an area or areas of increased lung opacity on CXR.

However, the diagnosis of pneumonia on CXR is complicated because of a number of other conditions in the lungs such as fluid overload (pulmonary edema), bleeding, volume loss (atelectasis or collapse), lung cancer, or post-radiation or surgical changes. Outside of the lungs, fluid in the pleural space (pleural effusion) also appears as increased opacity on CXR. When available, comparison of CXRs of the patient taken at different time points and correlation with clinical symptoms and history are helpful in making the diagnosis.

In addition, clinicians are faced with reading high volumes of images every shift. Being tired or distracted clinicians can miss important details in image. Here automated image analysis tools can come to help. For example, one can use machine learning to automate initial detection (imaging screening) of potential pneumonia cases in order to prioritize and expedite their review. Hence, we decided to develop a model to detect pneumonia from chest radiographs.

Dataset

We used the dataset of RSNA Pneumonia Detection Challenge from kaggle. It is a dataset of chest X-Rays with annotations, which shows which part of lung has symptoms of pneumonia.

Install machine learning tools

We will use Intelec AI to train a model to detect pneumonia. You can download and install it for free from here.

Data preparation

We downloaded the training images (stage_2_train_images.zip) and annotations (stage_2_train_labels.csv) from kaggle. The annotations file looked like this:

import pandas as pdann = pd.read_csv('stage_2_train_labels.csv')
ann.head()

First row in the above image corresponds to patient with ID ‘0004cfab-14fd-4e49–80ba-63a80b6bddd6’. ‘Target’ column is 0 for this row. It means this patient doesn’t have pneumonia. On other hand, patient in the last row has pneumonia because of the area on the corresponding chest radiograph (xmin = 264, ymin = 152, width = 213, height = 379) with opacity.

We decided to use SSD object detector. Intelec AI requires the annotation file to have columns image_name, xmin, ymin, xmax, ymax and class_name. Therefore we transformed our data into that format:

ann['image_name'] = ann.patientId + '.dcm'
ann = ann.rename(columns = {'x': 'xmin', 'y': 'ymin'})
ann['xmax'] = ann.xmin + ann.width
ann['ymax'] = ann.ymin + ann.height
ann['class_name'] = np.NaN
ann['class_name'][pd.notna(ann.xmin)] = 'pneumania'
ann = ann[['image_name', 'xmin', 'ymin', 'xmax', 'ymax', 'class_name']]
ann.head(10)

First 4 images in the above picture (lines 0–3) has no pneumonia annotations. On the other hand, image ‘00436515–870c-4b36-a041-de91049b9ab4.dcm’ has 2 annotations (lines 4 and 5). We saved it in “annotations.csv” file.

Then we created an “images” folder and extracted all images from stage_2_train_images.zip there. All provided images were in DICOM format (.dcm).

import os
images = os.listdir('images')
images[:5]

In the end, our dataset looked like this:

Then we created a training to train our SSD object detector. It was straightforward like this:

We started the training, it ran around a day. It stoped the training by itself, when it could not increase the training accuracy any more.

Clicking on the training summary showed mAP score 0.2725. We deployed it to check how well it performs. Testing the deployed model using a new chest radiograph gave the following result:

The prediction looks good. But how good it is, we can’t really say, since we don’t have any clinicians in our team.

Suggestions for improvement

Detecting pneumonia from images is still a difficult task to be solved using deep learning. Our achieved mAP score 0.2725 is low. It’s usually higher than 0.5 for most of object detection tasks. The problem is, object detectors are good at detecting objects, which have predefined shapes and outlooks. On other hand, opacities in lungs don’t have exact shapes. This makes this problem so difficult. We will give some thoughts about how to further improve the accuracy.

Trick 1: Classification before detection

We observed that our algorithm has high false positive rate, i.e it detects pneumonia in images, where it shouldn’t. Hence it would considerably improve the detection accuracy if we classified given chest radiograph as “has pneumonia or not” and detect pneumonia symptoms with object detector only if it has pneumonia.

Trick 2: Detect lungs first

Other problem is that opacity (‘white pixels’) exists also outside of lungs in chest radiographs. But we always detect opacities in lungs, because we know that pneumonia is related to problems in lungs. We can let our object detector learn this from the training data or we can help it detecting lungs separately and detect pneumonia in the lungs as a next step.

References

Intelec AI: https://www.intelec.ai

RSNA Pneumonia Detection Challenge: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/

What are lung opacities? https://www.kaggle.com/zahaviguy/what-are-lung-opacities

--

--