The world’s leading publication for data science, AI, and ML professionals.

Plotly for Hierarchical Data Visualization: Treemaps and More

A Step-by-Step Tutorial to Visualize the U.S. Great Resignation Using Plotly and Treemaps

Image Source: Pixabay
Image Source: Pixabay

Introduction

On December 08, 2021, the U.S. Bureau of Labor Statistics released its latest labor turnover data. From June to October, over 20 million Americans quit their jobs. The preliminary number in October shows that over 4 million Americans resigned in the month, nearly 25% higher compared to the same period of last year.

The Bureau of Labor Statistics: Economic News Release
The Bureau of Labor Statistics: Economic News Release

The labor turnover data also breaks down to different industries, providing rich and insightful information at a more granular level. However, notice that the data is presented in a table format with a hierarchical structure by industry sectors. This makes it challenging to spot patterns and quickly identify categories or sub-categories that need attention.

This is a typical scenario when Data Visualization comes to the rescue! How can we visualize this type of hierarchical data so that it delivers insights in an easy-to-read fashion and in the meantime makes efficient use of the space?

In this tutorial, we will use Plotly to create a treemap to visualize the resignation rates in October across different industries. A treemap displays hierarchical data using nested rectangles and uses the size and color of each rectangle to represent the different metrics we want to show. It is a perfect candidate for hierarchical data visualization.

As a bonus, at the end of the tutorial, we will also build a sunburst chart and icicle chart, which are very similar to treemaps and share the same input data format. The treemap we are going to create looks like this:

Image by Author
Image by Author

Plotly Express vs. Plotly Go

The Plotly Python library is an interactive, open-source graphing library that covers a wide range of chart types and data visualization use cases. It has a wrapper called Plotly Express which is a higher-level interface to Plotly.

Plotly Express is easy and quick to use as a starting point for creating the most common figures with simple syntax but lacks functionality and flexibility when it comes to more advanced chart types or customizations.

In contrast to Plotly Express, Plotly Go (Graph Objects) is a lower-level graphing package that generally requires more coding but is much more customizable and flexible. In this tutorial, we’ll use Plotly Go to create the treemap shown above. You can also save the code as your template for creating similar charts in other use cases.


Download and Prepare the Data

You can get the data for this tutorial from the Bureau of Labor Statistics Economic News Release Page. The data shows the quits levels (numbers are in thousands) and rates by industry, seasonally adjusted, from Jun to October 2021. I saved the data in an Excel file with the sheet name ‘By_Industry’ and also renamed the column names directly in Excel to make them more intuitive.

Let’s first read the data into Python. Since we are only interested in the resignation rate in October, we’ll only keep the ‘Industry’, ‘Oct_2021_Quits’, and ‘Oct_2021_Quit_Rates’ columns in our data frame. Notice that the ‘Industry’ Column in the data frame doesn’t reflect the hierarchical structure shown in the original table— we’ll need to add this missing piece of information to the data frame later in order to plot the treemap.

Image by Author
Image by Author

Create a Treemap with go.Treemap

To make a treemap using Plotly Graph_Objects, we need to, at minimum, ‘tell’ Plotly the following things:

  1. The label of each rectangle
  2. The hierarchy of the nested rectangles
  3. The size and color of each rectangle

In our data frame, the ‘Industry’ column shows the label for each rectangle. The ‘Oct_2021_Quits’ column can be used to represent the size of each rectangle – showing the total number of resignations in each industry sector. The ‘Oct_2021_Quit_Rates’ column can be used to represent the color of each rectangle – the darker the color, the higher the turnover rate for that sector.

The only piece of information that’s missing from the data frame is the hierarchy of the labels/rectangles. We need to create a new column (we can name it ‘parent’) to explicitly define the hierarchy structure of the labels. This is achieved by the following code:

Image by Author
Image by Author

Now we are ready to plot the treemap! In the code below, we’ll first create a graph_objects figure and then add the treemap trace to it by using go.Figure(go.Treemap()). Within go.Treemap(), we define the hierarchy by the values and parents attributes. The size/area of each rectangle is defined by the values attribute and the color is defined within the marker attribute. We can also customize the tooltip by editing the hovertemplate attribute with HTML code.

Image by Author
Image by Author

Fine-Tune the Treemap

The treemap we just created looks good! It seems pretty straightforward and convenient to use Plotly to create a treemap with just a few lines of code! Notice that in the treemap above, some industry sectors are squeezed into really small rectangles and the labels are illegible because of the size of the rectangles. Let’s fine-tune the treemap to address these issues.

In our visualization, since we are mainly interested in the resignation rates across different industry sectors, not so much in the number of resignations, we can remove the value attribute from our previous code. By not assigning a metric to the value attribute, the area of a category is now divided equally amongst the other subcategories within its parent category.

We can also force all the text labels to have the same font size by using the uniformtext layout parameter. The minsize attribute sets the desired font size. The mode attribute determines what happens for labels that cannot fit with the desired font size. You can choose to either hide them or show them with overflow.

Image by Author
Image by Author

Now the labels are much clearer to read, though there is still a small issue with the labels overflowing beyond the boundaries of the rectangles. Unfortunately, Plotly cannot wrap the text automatically so we’ll need to manually fix it.

In the code below, we use HTML code to break the long-form text into multiple lines with
element. The
HTML element produces a line break in text and the text after the
begins again at the start of the next line of the text block. We can name this wrapped-text column ‘id2’.

Now let’s complete our plot by assigning ‘id2’ to the labels attribute. We’ll also add a chart title, specify the font size of the title, and place it in the center located above the treemap.

Treemap (Image by Author)
Treemap (Image by Author)

Bonus: Sunburst Chart and Icicle Chart

There are also a couple of other chart types that are very similar to treemaps when it comes to visualizing hierarchical data, such as the sunburst chart and icicle chart. The input data format for these two chart types is the same as for treemaps: the hierarchy is defined by labels and parents attributes.

To plot the sunburst chart and icicle chart, the only thing that needs to be changed in the code is to replace go.Treemap with go.Sunburst() or go.Icicle(). The code below is used for creating the sunburst chart. Sunburst charts visualize hierarchical data spanning outwards radially from root to leaves.

Sunburst Chart (Image by Author)
Sunburst Chart (Image by Author)

Icicle charts visualize hierarchical data using rectangular sectors that cascade from root to leaves in one of four directions: up, down, left, or right. Icicle charts have a tiling attribute with two parameters orientation and flip. You can use these two parameters in combination to create each of the four directions: horizontal, vertical, left, or right.

Icicle Chart (Image by Author)
Icicle Chart (Image by Author)

Treemaps, sunburst charts, and icicle charts are lesser-common visualizations compared to other chart types like bar charts, line charts, or pie charts, etc. However, when it comes to visualizing the hierarchical data, they are absolutely the perfect choices to get the job done, nicely and efficiently! Thanks for reading and I hope you find this article helpful in brushing up your Plotly and data visualization skills!


Reference and Data Source:

  1. Plotly’s Official Documentation Page: https://plotly.com/python/treemaps/
  2. Data Source: Economic News Release from Bureau of Labor Statistics (https://www.bls.gov/news.release/jolts.t04.htm). This is an open dataset without a license.

You can unlock full access to my writing and the rest of Medium by signing up for Medium membership ($5 per month) through this referral link. By signing up through this link, I will receive a portion of your membership fee at no additional cost to you. Thank you!


Related Articles