How we visualize data at Amino

We’re all about bringing transparency to healthcare. To do this, we take a ton of complex information and make it simple and easy for anyone to understand. The goal of effective data visualization is no different, and that’s why we care about it so much at Amino.

While the intersection of healthcare and data science is an exciting frontier, there is a big problem: healthcare is already confusing. Add lots and lots of data, and it can be really difficult to understand what you’re looking at — much less glean any meaningful insight. The doctors and data scientists of the world have highly specialized jobs, and as a result of this extreme specialization, the jargon they use is often incomprehensible to the people who could benefit most from their expertise.

At Amino, we believe that data scientists need to know how to explain their work in clear terms that everyone can understand because these days, people are increasingly put in charge of their own healthcare choices and expenses. They need to become experts fast. That’s where data visualization comes in. Data visualization makes it easy for anyone to quickly understand key insights from a ton of complex information.

Over the last year, we’ve developed a process for creating effective data visualizations that help people make better healthcare decisions. Below is a behind-the-scenes look at that process — plus how one of our favorite projects came to life, from start to finish.

The data visualization process at Amino

Step 0: Analyze the data

Before you can visualize anything, you’ll need some data! We often get asked if our analyses inspire a visual or if an idea for a visual inspires an analysis. We’ve found the former to be true — the most compelling visuals are usually born from the amazing work our data scientists do everyday.

Step 1: Craft a story

This is the most important — and most overlooked — part of the process. What are you trying to say? What is the question you are trying to answer? What is the insight in your data? What argument are you trying to make? What position are you supporting or refuting with your data? What should the reader take away from the visualization? Answering these types of question up front can save you a ton of time later in the process by reducing the number of revisions and edits you have to make. If you don’t have much of a story, or the story is too hard to craft, you should take a step back and reevaluate what you’re trying to do with the data.

Step 2: Make a prototype

Plot your data. Don’t worry about it being pretty. Just plot it. I highly encourage sketching rough charts by hand before using software. If your initial plots start to tell the story outlined in Step 1, you’re headed in the right direction. If not, it’s worth thinking about reframing the story.

Step 3: Refine it

Ideally, every pixel on your visual supports the story you are trying to tell. Data visualization pioneer Edward Tufte calls this improving the “data-to-ink ratio.” Sometimes this means grouping individual data points into categories, or removing unnecessary categories altogether. Other times this means removing unnecessary legends or visual effects like drop-shadows. This is also where you need to start thinking about aesthetics; concise labels, ample spacing between text and data points, proper alignment of text for easier reading, and distinct colors for highlighting data, among others.

Step 4: Get feedback and refine some more

Once you have a draft ready, constructive feedback — especially from non-data practitioners — can help make your visualization more approachable and easy to understand. I try to gather feedback by posing the following questions:

“What is this chart telling you?”
“What elements help you understand the data?”
“What elements are confusing?”

If the answers you get to the first question aren’t the one you hoped to answer in Step 1, it’s worth doing a bit of digging into what is causing the misinterpretation. Sometimes, all that’s needed is a simple change to a title or label to reframe the whole idea to the reader.

Step 5: Publish, and reflect on your work

You’ve made it! Time to share your visualization with the world. In our experience, if we get really lucky, our visuals get published in the press and shared on social media. I’m especially interested in the conversations generated from a visual. If we achieved our goal of making a complex idea clear and easy to understand, the conversations aren’t about the data itself, but the insight derived from them.

Take this tweet, featuring Amino’s data on the cost of an IUD:

If you read the replies, the discussion tends to circle around cost of IUDs that people paid based on their own experience, as opposed to the data itself (what is it, how was it derived, etc.). We were pretty proud of this particular chart.

Amino’s guiding principles for data visualization

While going through the process, we developed a few guiding principles to keep us focused.

Know your audience. Use terms everyone can understand.

Wherever possible, we avoid using technical jargon and try to offer an interpretation of our data in simple terms everyone can understand. If we find that we have to rely on technical jargon to express a concept, chances are we haven’t simplified it enough. Too often, technical writing (like research papers) lacks simple and clear explanations of the author’s findings. As a result, they are often misinterpreted — sometimes purposefully with the intent to deceive! This is especially common in healthcare and politics. We want to be good stewards of our data and prevent its misuse.

Be transparent, truthful, and trustworthy.

No misleading guides or scales, and avoid the use of proprietary “black box” methods or measures. We strive to design every chart and graph so that it can live on its own; independent from a specific piece of content like a news article or blog post. You never know where your work could end up!

Maximize visual accessibility, especially for mobile screens.

We use a color scheme that is discernible to people with common types of colorblindness. We take a “mobile-first” approach to all of our content, since the majority of Americans consume their news via smartphones and tablets. As such, we’ve optimized our data visualizations so they can be read easily on a mobile screen. In practice, this means our visuals have thin margins, bold fonts, and lots of contrast.

How the data visualization process works: a real life example

In June, we published a data story on the various types of ultrasounds that women receive during pregnancy. This was a topic very closely aligned with our mission at Amino. We knew that during pregnancy, it’s extremely difficult to figure out what types of ultrasounds you need, how often you’ll need them, when you should expect to get them, or how much they’ll cost. While there’s no such thing as a “universal” pregnancy experience, our goal was to help pregnant women understand — and visualize — a baseline of what to expect.

However, in our initial data exploration, we found that there are nearly 20 different codes for various types of pregnancy ultrasounds. How could we represent this data clearly, simply, and effectively? And most importantly, how could we package it so that it’s actually useful? Below is a step-by-step look at how we went from initial analysis to final visualization — a single, easy to understand graphic that represents more than 2 million data points from 308,000 pregnancies.

Step 0: Analyze the data

Before a single data point was plotted, we did a ton of analysis first. In our exploration of ultrasound claims related to pregnancy, we were able to follow the stages of pregnancy by observing when different types of ultrasounds appear in insurance claims. Below is an example of what one woman’s pregnancy ultrasound claims looked like.

Example of one patient’s pregnancy ultrasound claims. This patient likely had twins, which could explain why there are so many TVUs and follow-up scans.

Step 1: Craft a story

A typical question we heard from women while doing research on this topic was “Is my experience with ultrasounds normal?” We didn’t find any resources online that offered a bird’s-eye view of when each type of ultrasound tends to happen, so we figured we could make our own. If we have thousands of examples of the data above, we can generalize the frequency and timing of ultrasounds and offer women a sense of how similar or different their own experience is.

Step 2: Make a prototype

I initially thought about visualizing this data as a line chart. The x-axis would represent time, while the y-axis would represent number of ultrasounds. Each type of ultrasound would get its own line:

However, it quickly became apparent that keeping track of all the various types of codes with labels or a legend was too difficult. The overlapping lines were hard to read as well. Thus, we thought a heatmap might work better.

In a heatmap, the number of ultrasounds received at each stage of pregnancy would be represented via color — the darker the color, the more ultrasounds at that particular time. This allowed the types of ultrasounds to be clearly labeled as their own axis.

A first draft was plotted on the computer. The y-axis is time in days, and the columns are the codes for various types of ultrasounds.

Right away we saw some interesting patterns. Some codes appeared earlier in the pregnancy while others appeared later. However, it’s still very complicated, and a lot could be done to simplify things — namely, you’d have to be a medical coding expert to make sense of the codes across the top of the chart. Not ideal.

Step 3: Refine

Time to refine things. There are two main ways we simplified the above draft: grouping the codes into categories with clear labels, and showing time as weeks instead of days with the trimesters as the main demarcation.

Flipping the chart so it’s oriented horizontally made it easier to read as well. Now, all text is horizontal — no head tilting necessary.

Step 4: Get feedback and refine some more

From the feedback we heard, it was still hard to see the “hot spots” for each type of ultrasound. We attributed this to too many colors. We reduced the number of steps between colors and increased the lower limit on number of ultrasounds per week that were included in the visual.

These tweaks increased the contrast between steps and made the “hot spots” for each type of ultrasound more apparent.

Previously, every week with an observed ultrasound was included in the visual, but in the updated version, only weeks with at least 100 ultrasounds were shown.

Step 5: Publish, and reflect on your work

The final step involved writing a clear title, descriptive subtitle, helpful legend, and additional pointers. We like our titles to be slightly editorialized and whenever possible answer the “so what?” behind the chart. The subtitles are always a more technical description of the data being presented.

One of the conscious decisions we made with the legend was not to include any numerical demarcations. We figured it didn’t much matter to the reader the exact number of ultrasounds that were taken at any given week — the differences in color intensity were sufficient to tell the story. In the blog post, we clarified that bands in gray represent weeks where only 100–200 patients were observed as receiving an ultrasound, while the darkest purple bands represent 10,000+ patient observations.

We were delighted to see that the visual, along with the rest of our analysis, got featured in an article by The Bump, a news site dedicated to empowering soon-to-be parents with information about pregnancy.

A few final thoughts and further reading

Making sense of increasingly vast quantities of information is a challenge data scientists and non-data scientists face alike — and it goes beyond healthcare. We live in a world awash with data. So many decisions that affect our lives are made with huge sums of information that are often times beyond our comprehension: what shows up in our news feed, who we go on a date with, what kind of rate we get on a housing loan, or what exactly our health insurance will pay for. Effective data visualization is a small step towards making this sea of information easier to understand for everyone.

If you’re new to data visualization and looking for inspiration, guides, and tools, below are links to a few great resources. You can also check out the rest of our work here.

Inspiration

Guides

Tools

  • ggplot2 & ggmap² — data visualization and mapping for R
  • matplotlib — data visualization for Python
  • RStudio — a slick R environment that makes working with ggplot2 a breeze
  • Shiny — create interactive charts and graphs for the web with ggplot2 syntax
  • Gephi — powerful and extensible open source graph visualization software
  • D3.js — the web standard for information visualization
  • Mode — my current favorite data collaboration tool

¹ I actually prefer Envisioning Information over his better known work The Visual Display of Quantitative Information as I think the discussion and examples included in it are more relevant to modern media.

² At Amino, we use ggplot2/ggmap for R, in combination with Sketch to create our visuals.