Querying Hierarchical API’s with Recursion

A guide to writing a recursive function extracting relationships from the ICD API

Patrick Normile
Towards Data Science

--

Photo by Brannon Naito on Unsplash

Recursion is powerful, but I have always been intimidated by it. Recently, I needed to pull hierarchical data from an API and I realized recursion would make a lot of sense, so I had to face my fears. Hopefully, this guide will help if you also shy away from recursion and would like to see a practical example in action.

ICD — International Classification of Diseases

First, I’ll explain the ICD-10 diagnosis codes and how they are stored in the ICD API (icd.who.int/icdapi). An ICD diagnosis code is given by a healthcare provider when a patient seeks healthcare services so the payer can understand the reason for the visit and what they are being treated for. The ICD diagnosis code has three or more characters, starting with a letter followed by numbers. For example, if a provider diagnoses a patient with brain cancer in the frontal lobe, they could use the code C711 which is “Malignant neoplasm of frontal lobe” (though they may choose to use an unspecified code such as C719, i.e. “Malignant neoplasm of brain, unspecified”, for example, if it exists in multiple locations). If we would like to analyze patients with brain cancer, we may not care where exactly the tumor is located so we need a way to group all the brain cancer codes.

In the ICD-10 hierarchy, the codes beginning with C71 are all the codes for “Malignant neoplasm of brain”, and the C71 codes fall into a broader category of C69-C72, which are “Malignant neoplasms of eye, brain and other parts of central nervous system”. We can go another level up to C00-D49, which represents the codes for all Neoplasms. Therein lies the hierarchical structure we wish to extract.

This structure is helpful- because it allows us to analyze data at different levels of granularity. We may want to analyze by broad disease categories such as cancer, respiratory diseases, digestive tract diseases, infections, etc. Or we might want to dig deeper by analyzing more specific diseases, like skin cancers or shigellosis infections. It would be nice if we could simply take the first n characters of the ICD code to get different levels of granularity, but as we saw with the brain cancer example we have some codes that start with C6 grouped with codes that start with C7, and not all C6 codes are in the same group. In fact, several disease groups begin with C6 (C60-C63 for “Malignant neoplasms of male genital organs”, C64-C68 for “Malignant neoplasms of urinary tract”, and C69-C72 for “Malignant neoplasms of eye, brain and other parts of central nervous system”).

The icd10data.com website helps visualize this hierarchy. Clicking through the links on this page, you can see how different disease categories are grouped and how the codes become more specific the deeper you go.

ICD API

The ICD API (icd.who.int/icdapi) contains this valuable hierarchy information. With some simple python requests, we can extract the information contained in each level of the hierarchy, and determine what codes are on the next level. First, let’s get set up. You’ll need to create an account and get your own client ID and client secret. I store my credentials on my Credential Manager (Windows) and use the keyring package to access them. Once you have your credentials, you can generate an access token. A token has a limited amount of time that you can use it, more on that later.

Now that we have this set up, we can write some functions to extract data from the API and return it as JSON. There will be an element that is returned from each request called “child” and this is where the recursion comes in. We will start at the top of the hierarchy, get all the children from the top level URI, find all the children of those children, and so on until we run into a URI with no children.

Here is an example. We start with the URI https://id.who.int/icd/release/10 and find the latest version URI, which is http://id.who.int/icd/release/10/2019. We save that as the variable uri_latest. From there, we can use the function get_contents to find all child links from the latest version. After that, we can repeat the same process on each child to capture the hierarchical structure of the ICD diagnosis codes.

#Children links (corresponds to links on https://www.icd10data.com/ICD10CM/Codes)In: get(uri_latest)
Out:
{'@context': 'http://id.who.int/icd/contexts/contextForTopLevel.json',
'@id': 'http://id.who.int/icd/release/10/2019',
'title': {'@language': 'en',
'@value': 'International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) Version for 2019'},
'releaseDate': '2020-02-01',
'child': ['http://id.who.int/icd/release/10/2019/I',
'http://id.who.int/icd/release/10/2019/II',
'http://id.who.int/icd/release/10/2019/III',
'http://id.who.int/icd/release/10/2019/IV',
'http://id.who.int/icd/release/10/2019/V',
'http://id.who.int/icd/release/10/2019/VI',
'http://id.who.int/icd/release/10/2019/VII',
'http://id.who.int/icd/release/10/2019/VIII',
'http://id.who.int/icd/release/10/2019/IX',
'http://id.who.int/icd/release/10/2019/X',
'http://id.who.int/icd/release/10/2019/XI',
'http://id.who.int/icd/release/10/2019/XII',
'http://id.who.int/icd/release/10/2019/XIII',
'http://id.who.int/icd/release/10/2019/XIV',
'http://id.who.int/icd/release/10/2019/XV',
'http://id.who.int/icd/release/10/2019/XVI',
'http://id.who.int/icd/release/10/2019/XVII',
'http://id.who.int/icd/release/10/2019/XVIII',
'http://id.who.int/icd/release/10/2019/XIX',
'http://id.who.int/icd/release/10/2019/XX',
'http://id.who.int/icd/release/10/2019/XXI',
'http://id.who.int/icd/release/10/2019/XXII'],
'browserUrl': 'http://apps.who.int/classifications/icd10/browse/2019/en'}

API time limit

When I kicked off this process things were looking good and it was running fine. However, after about an hour the recursive process failed and I realized it was because my access token had expired. This meant I needed a way to keep track of how long I was using a token and update it before it expires. To do this within a recursive function, you need to keep track of the variables in the global scope. Since the function is calling itself over and over, using variables in the local scope would be a pain.

Below are two functions that update the token and header that are used in the ‘get’ function.

Recursion

Lastly, we need a place to store all the information we’ve queried. Again, we use a global variable ‘dfs’ which is a list of all the relationships we’ve extracted and will combine at the end. Each item in the list will contain the parent code and parent title, and a child code and child title, creating a tree structure.

Our recursive function will need to find all the relevant information of a URI using the get_contents function, store what we want in our global variable dfs, then call itself on each child of the URI to continue the process. While this is happening, it checks if the token and header need updating before making the next request.

Displaying the data frame, we start at the root node (no parent) and find all the relationships below it, starting with “Certain infections and parasitic diseases”. Then, the recursion goes through all the sub-classifications of “Certain infections and parasitic diseases” starting with “Intestinal infectious diseases”. Next, it goes to the subclassifications from there, starting with “Cholera”. After it gathers all the specific cholera diagnosis codes, it moves on to “Typhoid and paratyphoid fever”, and so on.

Image created by the author

After letting the function run for about 4 hours on my machine, I was able to capture all of the relationships between ICD codes and their hierarchy. Some cleanup was needed to organize this into a useful table from its tree structure, but once I had that I could change the granularity of my analyses by replacing the original diagnoses codes with higher-level codes from their hierarchy. This allows me to more easily compare patients with similar conditions.

I hope this guide inspires you to use recursion where it makes sense. You can find the full code on my GitHub https://github.com/patricknormile/examples/blob/main/icdAPI_recursion_example.py

Thank you for reading!

--

--