Using machine learning to build a conversational radiology assistant for Google Home

Published in

Towards Data Science

6 min readJan 30, 2018

Introduction

I have been excited about conversational agents for some time, previously building an iOS chatbot simulating a human radiologist powered by Watson.

As a delightful weekend project, I sat down with my glorious corgi and lots of coffee and built a radiology assistant for Google Home. It is designed to assist healthcare providers with their radiology needs in a quick, conversational, hands-free way. Some video demonstrations:

Smart speakers are a revolutionary technology, and tools like Google Home and Amazon Echo have the potential to offer immense value in healthcare. This tech is powerful and interesting for a number of reasons:

It facilitates looking at and connecting with our patients. Patients hate when their physician continually looks at a computer or tablet, and this behavior degrades the physician-patient relationship. Smart speakers facilitate charting data and accessing information while continuing to make eye contact with the patient.
The need for sterility. Accessing information in the sterile operating room is cumbersome, and surgeons are essentially robbed of the ability to use standard desktop/smartphone technologies. This is solved beautifully by the smart speaker.
Uniquely low friction information retrieval. Healthcare providers face crushing workloads, and there is a premium on immediate information accessed seamlessly. Sure, I could take 3 minutes to look up the Fleischner criteria, but it is much better if I can simply “ask the room” and get an answer in 5 seconds. Furthermore, new Google Home apps are immediately available without the need for a separate download and the management of 100 fragmented applications.
The power of conversation. We are innately programmed to interact with the world through conversation, and well-crafted conversational software is exceptionally intuitive and easy to use. This may be particularly useful for older patients/physicians that dislike traditional GUIs.

You now have an understanding of why this platform is cool, but what exactly can our radiology assistant do? Examples of current functionality include:

Incidental lesion follow-up recommendations (e.g. pulmonary nodule Fleischner criteria)
Contrast allergy pretreatment protocol
Size threshold information (e.g. thresholds for hypertrophic pyloric stenosis)
Clinical score calculators (e.g. MELD score)

In this article, I will give you a detailed description of how exactly this Google Home tool was built.

Building a conversational radiology assistant

Step 1: Using DialogFlow to understand language

First, I submit to you an incredibly fancy diagram providing an overview of information flow within the application:

When a user queries the Google Home device, natural language processing is performed by DialogFlow to understand the user input and parse it into useful variables of interest. These variables are passed to a webhook for further processing. Following webhook processing, information is returned to DialogFlow and used to create the Google Home output message.

To build this sort of tool, the first thing you need is powerful natural language processing to understand the user input and break it into meaningful chunks. This can be accomplished using DialogFlow, a service provided by Google (via an acquisition of the popular API.AI).

In DialogFlow, modules of functionality are called Intents. Remember how the application provides Fleischner follow-up recommendations for pulmonary nodules? That is the pulmonary nodule Intent. Similarly, there is an intent for MELD score calculation, contrast allergy pretreatment, and the normal dimensions of the pylorus. Just look at all of these wonderful intents:

But how can the program take a given user input and pair it to the appropriate Intent? It can do this because training data makes it smart. That is, we provide a bunch of examples of things a user might say to trigger a particular intent — the more we provide the smarter the A.I. — and the program “learns” to correctly identify the intent of interest. Here is an example of the sort of data that was used to train the “MELD score” Intent — it is just a bunch of ways a user might ask about MELD score:

So, we now understand how the program determines what exactly the user is asking about. The next important NLP task performed by DialogFlow is parsing the user input into meaningful chunks that can be understood and processed.

As an example, take the case of pulmonary nodule evaluation. In order to provide a Fleischner recommendation, we need four bits of information about the nodule: (1) nodule character (solid, part solid, ground glass), (2) nodule size, (3) patient risk level (low or high), and (4) nodule number (single or multiple). Bits of data like this in DialogFlow are called Entities, and we create a separate Entity for each of these four variables (technically we use a generic “unit length” entity for nodule size, but that can be ignored for this discussion). It ends up looking a little something like this:

The cool thing about DialogFlow is that it provides smart natural language processing that can, given appropriate training data, automatically extract entities of interest from an arbitrary sentence. This means that when a user says “I have a 6 millimeter solid pulmonary nodule in a high risk patient,” DialogFlow naturally parses the data and stores nodule size (6 mm), nodule character (solid), and patient risk (high) as usable variables. In this case it recognizes that it still needs to know “nodule number” and will thus ask about this variable using the “prompt” listed above. If the user query contains all 4 necessary bits of information, it immediately moves on and gives a recommendation. Conversely, if the user simple says “lung nodule,” it asks about all four parameters.

How is DialogFlow smart enough to know which parts of a sentence represent which Entity? Training data of course! You simply label the relevant part of the sentence as a particular Entity, and it learns to identify those Entities in new sentences. It looks like this:

Yellow label = nodule number; orange label = nodule size; pink label = nodule character; purple label = risk level. Note that you can specify that “smoker” is equivalent to saying “high risk.” Pretty cool!

Step 2: Using a webhook to process the understood language

Thus far, we have used fancy natural language processing to take a user input, understand what the user wants (the Intent), and break it into meaningful parameters of interest (the Entities).

Now that we have these parameters, we use a “webhook” to process them and provide the user with the information they want (MELD score, Fleischner recommendation, etc.). For our purposes you can think of a webhook as a bit of code that accepts data from DialogFlow, processes it, and returns a final value of interest.

In the pulmonary nodule case, the webhook takes the four nodule characteristics described above and performs a series of simple logic statements to identify and return the relevant Fleischner recommendation. That is, it takes the basic logic described here:

And turns it into this:

Following this webhook processing, an appropriate answer is returned to DialogFlow, which feeds that answer into the Google Home and, voila, you have a smart radiology assistant.

Conclusion

I hope you found this tutorial useful! Further, I hope you spend some time pondering how this fascinating new platform can offer value to physicians and their patients. Let me know if you have any questions, and please feel free to reach out on twitter or LinkedIn if you would like to chat. Thanks for reading!