The world’s leading publication for data science, AI, and ML professionals.

Minimizing Overlapping Labels in Interactive Visualizations

An extremely efficient, greedy algorithm for automatic label placement with surprisingly strong results.

One of the most challenging areas of user-controlled and real-time data visualization is label placement. In many of my visualizations, I attempt to design visualizations where labels cannot possibly overlap – avoiding this challenging problem all together – but in my most recent visualization this was not an option.

My [91-DIVOC visualization of the Covid-19 pandemic](http://Visualization of New COVID-19 Cases /Day of every state in the United States, with "render or wiggle" label placement.) allows users to view the most recent coronavirus data from Johns Hopkins University through an interactive visualization that was built using the d3.js library. As the visualization was using data that is updated multiple times a day, combined with the ability for the user to explore the data and create over a billion distinct visualizations, everything had to be rendered programmatically.

There are multiple constraints, constraints common for many line-graph visualizations, that combine to making readable labels challenging:

  • There may be hundreds of different lines, all ending at the same x-position.
  • Some lines are more important than other lines. In the 91-DIVOC visualization, the user can "highlight" one or more countries and that line is darkened, thickened, and the label is larger.
  • The majority of users are using mobile devices, limiting the computational resources available.

Without using any label placement, the visualization of the new COVID-19 cases by day with several states highlighted shows multiple unreadable highlighted labels and many more unreadable non-highlighted labels.

Common Approach: Force-directed Graphs

A common approach to solve label placement is to use d3.js’s "force" component, which implements a force-directed graph. A force-directed graph is a physics-based simulation where all elements have an "attraction" or "repulsion" force relative to other elements. In the case of label placement algorithms, every element is given a slight repulsion force from other elements resulting in the simulation pushing elements away from each other when possible to create readable labels. The result, when the simulation reaches a stable final state, is quite good and it is a sold approach.

Unfortunately, force-directed graphs are slow. The running time of a force-directed graph grows cubic to the input, O(n³), and areas dense with elements required a large amount of computing power before converging. In early experiments with a force-directed graph label placement algorithm, more time was spent running the force-directed graph than was spent processing and rendering the entire rest of the visualization. It was time for a new solution.

A Fast Solution: "Render or Nudge"

With the frustrating slow force-directed graph failing, it was critical the running time of any solution was minimal. To conserve as much processing time as possible, every single label’s placement came down to a single decision to "render or nudge":

  1. If no label is currently rendered in the area, render the label. This decision is immediate and the placement will never be changed.
  2. If another, previously rendered label was already rendered that overlaps with the new label, nudge the label to attempt to find a better position.

The process of "nudging" the label attempts to place it the label up to the label’s height above or below the intended location. If no suitable location was found, the label is rendered in its original position creating overlapping labels. (Nudging the label further away from the intended location often causes a visual discontinuity between the data and the data label.)

With labels considered only once, this algorithm runs in linear-time, O(n), and would be described as a "greedy algorithm" for label placement.

Results

After implementing this algorithm, the results were stunning for the speed of the algorithm. Below, you can see that both highlighted labels of are all readable and many more of 40+ non-highlighted labels are also readable.

When considering label placement in a Data Visualization, think about computationally inexpensive solutions that do not require global positioning. Although this solution does not ensure that every label does not overlap, it vastly improves naive label placement and greatly improves readability of the any visualization with dozens or hundreds of labels.

(You can check out the visualization, with this label placement algorithm, on my visualization 91-DIVOC #01: "An interactive visualization of the exponential spread of COVID-19")


Related Articles