Hands-on Tutorials

Tutorial on Building a Professional Bar Graph in Plotly Python

A step-by-step guide on creating a high-quality bar chart

Tom Price

Published in

Towards Data Science

10 min readJul 5, 2021

Final graph iteration for this tutorial | Image by author

As a continuation of my previous article, “Tutorial on Building Professional Scatter Graphs in Plotly Python,” I’ve produced another Plotly tutorial in Python on how to build a visually stunning bar chart.

This article aims to take your Plotly graphs in Python to the next level, assuming you are already somewhat familiar with the language. I will apply a step-by-step iterative method to help simplify the process. Read on if you want to improve the quality of your visuals.

The dataset I will be using is on e-commerce sales in Pakistan. You can download the data from Kaggle or my Git repository, along with the rest of the code for this tutorial. Note: rows with missing values were deleted from the Kaggle version to upload onto Github.

We will step through six different versions of the same graph. Starting with the most basic chart that Plotly provides off the shelf, progressing to a fully customized and annotated diagram at the end. I will provide explanations of the fundamental changes at each step.

Before we start coding

It’s worth mentioning how Plotly comprises its graph objects. An easy way of thinking about it is picturing the graph built in four layers.

Layer 1 — Create a blank graph object.
Layer 2 — Add in and customize your data points.
Layer 3 — Customize your visual.
Layer 4 — Annotate your visual.

When visualizing data, it’s always essential to understand what question we want to ask of the data (assuming we’re carrying out explanatory analysis rather than exploratory analysis).

The question we want to find out from the dataset is:

During Q3 2018, what category saw the largest growth year on year, and what was driving this?

Index

Step 1: Data manipulation
Step 2: Create a standard bar chart
Step 3: Styling changes
Step 4: Presenting multiple metrics
Step 5: Reducing the clutter
Step 6: Direct the reader’s attention
Step 7: Add commentary
Conclusion

Step 1: Data manipulation

We want to manipulate the dataset we are using so we can create our initial graph. We need to make several variables that we want to display and filter out unnecessary data. *** say what variables I’m adding *** I’ve decided to drop a large number of NA’s, which may not always be the best decision, but it is acceptable for this article.

I won’t explain the data manipulation process because that is a tutorial on its own and this article primarily focuses on creating a bar graph in Plotly. However, I’ve annotated each step in the code below.

Step 1: Data manipulation — code | Image by author

Step 2: Create a standard bar chart

For Layer 1, create a blank graph object with go.Figure().

For Layer 2, use add_trace(...)to add a layer of data points to your blank canvas. This command is where we execute all data customizations, such as color, size, transparency.

To investigate growth, I’ve chosen to show the absolute variance in sales rather than percentages.

Step 2: Create a standard bar chart — code | Image by author

Step 2: Create a standard bar chart — graph | Image by author

In the above graph, you will notice the graph title and axes titles are missing. Plotly Graph Objects doesn’t generate these by default, so they have to be explicitly defined.

Step 3: Styling changes

One of the most critical concepts for visualizing data is highlighting the data points you want to show and reducing the noise from less essential elements. Nonetheless, always clearly label all features in a graph, so the reader isn’t left guessing how to interpret the chart.

For this iteration, we will add Layer 3 to our plot. We will customize how the frame of our graph looks using update_layout(...). This will be where you update everything that isn’t the data points, so aspects like titles, axes, tick format, gridlines, etc. We will push the less critical features, such as axes lines, axes tick labels, axes titles, and the plot background, to the back using color.

For this iteration, we will make the following changes:

Add a title
Add axis titles
Order the categories
Change the line color of the axes from white to light grey
Change the background color to white
Change the color of all text on the graph to light grey
Add currency prefix to x-axis ticks

Step 3: Styling changes — code | Image by author

Step 3: Styling changes —graph | Image by author

We’ve created a list of categories in the order that we want, largest positive variance to largest negative variance, and then passed this to the category array argument. This may seem long-winded, but we will introduce other metrics on supplementary plots in the next step. The category order will be maintained across each plot.

In the graph above, you can now clearly see what category saw the most significant growth year on year. Next, we will find out what was causing this increase.

Step 4: Presenting multiple metrics

It’s not always a good idea to squeeze as much information into the visualizations as possible. Only include extra detail if it supports the story you’re trying to tell. You should also consider the complexity of the topic and who the audience is; sometimes, it might be easier to split the story over several visuals.

On this occasion, we are going to display additional metrics on our graph via subplots. This is because it’s needed to answer our original question, and the message isn’t overly complicated. In the later steps, we will use techniques to focus our reader’s attention on what’s important.

In this step, we use make_subplots(...) to create our blank plot instead of go.Figure(). Since we’re dealing with subplots, we’ll create two columns in our data frame to hold each metric's row and column position within each subplot. We then use the loop for metric in df2["metric"].unique(): to generate a graph for each metric in the subplot at the given row and column coordinates.

When working with subplots in Plotly, we have the option to update individual x-axes with xaxis = dict(...), xaxis2 = dict(...), etc., or to update all x-axes with fig.update_xaxes(...). We will use the former code for axis-specific formatting and the latter code for formatting common to all axes. The same can also be done for y-axes.

For this iteration, we are going to make the following changes:

Add subplots to the graph to show additional metrics
Add titles to each subplot
Move the y-axis to x = 0

Step 4: Presenting multiple metrics — code | Image by author

Step 4: Presenting multiple metrics — graph | Image by author

The introduction of the subplots has made the graph considerably busier by default. There are a couple of significant issues jumping out.

Too much clutter

The above graph is far too heavy and contains duplicated information such as the legend and the y-axes for the 2nd and 3rd subplots. To create a professional-looking chart, we want to be very selective about the elements we incorporate. Include the least amount of information possible, which allows the graph to be understood and still answers the question.

Disconnect between subplots

The addition of the ‘Items Ord Variance’ and ‘Avg RSP Variance’ subplots complicate this visual message. This is because they are on entirely different scales compared to the ‘Sales Variance’ subplot.

Step 5: Reducing the clutter

Reducing the clutter will lower the reader’s cognitive load and make the information more easily digestible. To help reduce the untidiness, we will set shared_yaxes = True in the make_subplots(...) command.

We will address the issue with different scales between subplots by dropping the ‘Items Ord Variance’ and ‘Avg RSP Variance’ subplots for their percentage counterparts and adding the Sales Variance (%) metric.

We will also move the x-axes to the top of the charts so the subplot titles can also function as axes titles. This is more pragmatic because the focus is on the top category. Due to moving the x-axes, we will have to remove the subplot titles so they don’t overlap and add the subplot titles manually using fig.add_annotation(...).

To add a title for each subplot in the least amount of code, we’ll create a data frame df_subplot_titles to hold the titles, the subplot x-axis column, and the minimum x-axis range. We can then loop over each title in this data frame to add them to the subplot.

For this iteration, we are going to make the following changes:

Move the x-axes to the top of the charts
Use percentage change instead of absolute difference for the additional plots
Remove the legend
Remove duplicated y-axes
Manually add subplot titles

Step 5: Reducing the clutter — code | Image by author

Step 5: Reducing the clutter — graph | Image by author

We have reached a state where the graph answers our original question, “What category saw the largest growth year on year and what was driving this?”. The bar chart above shows that Mobiles & Tablets saw the most considerable growth, and it was due to an increase in avg RSP. But there are still some significant flaws.

Disconnect between subplots

There is more consistency in this graph amongst the subplots compared to the last step. However, there is still a disconnect between the right three plots and the absolute sales variance plot because they remain on different scales.

No focal point

Currently, there are no elements in the graph that really draw the reader’s attention. We need to highlight the message of the visual, so readers immediately know where to look.

Step 6: Direct the reader’s attention

As discussed in “Storytelling with Data: A Data Visualization Guide For Business Professionals,” we have approximately 3–8 seconds with our readers before they decide whether or not they should continue reading. This means the message from our visual needs to be as clear as possible.

Two simple ways to direct the reader’s attention are through color and size. We will use these to overcome the two issues discussed at the end of the previous step. Color will be used to highlight the importance of Mobiles & Tablets, the category with the most considerable absolute sales growth while pushing all other groups to the background. Using size, we can differentiate between the plots on an absolute and percentage scale by making the latter bars thinner.

There’s not a straightforward way to color individual bars within a bar chart in Plotly Graph Objects. Instead, we will need to create two separate data frames. One will consist of the Mobiles & Tablets data that will be colored blue, and the other will consist of all remaining data colored in grey. Inside the loop for metric in df3["metric"].unique():, we will create two traces withfig.add_trace(go.Bar(...)). This code will now essentially overlay two graphs on top of each other for each metric.

We will use the column_widths = [...]command in make_subplots(...)to set how wide each plot is, giving more prominence to the ‘Sales Variance (Abs)’ plot.

For this iteration, we are going to make the following changes:

Highlight the Mobiles & Tablets category blue and color all other categories grey
Widen the ‘Sales Variance (Abs)’ plot
Make the bars thinner on the percentage variance plots

Step 6: Direct the reader’s attention — code | Image by author

Step 6: Direct the reader’s attention — graph | Image by author

In this graph, we have successfully attracted the reader’s attention to the performance of the Mobiles & Tablets category through the use of color. The right three subplots are on a different scale to the far left subplot. This is fine because the bars are different sizes and suggest they are showing different information.

We can make several minor improvements to improve this visual further, but two apparent enhancements will take the graph to the next level.

Commentary

We’ve used several techniques to direct the reader’s attention to the performance of the Mobiles & Tablets category. To make our message as straightforward as possible, we should also include commentary to describe precisely what our chart is showing. This will also allow us to provide additional insight into the avg RSP subplot. A 356% increase looks suspicious and would almost certainly be questioned in a business setting.

Insightful title

The graph title is one of the most valuable pieces of real estate on a chart. Regardless of where a reader looks first, they will at some point read the title to make sense of the graph. If there is one standout clear message in the visual, then I tend to use the title to describe this.

Step 7: Add commentary

To make your visuals look professional, you should provide commentary and insights on your graphs. This is the addition of Layer 4 and can be achieved with add_annotation(...).

I tend to prefer to have the commentary overlayed on the graph instead of below or next to it. This allows more space on the page for your visual, and also, the text is considerably closer to the data.

We will use color in the commentary and the newly improved title tying together what we’re talking about in the visual. Any metric in the text and the graph is given the same color, blue.

For this iteration, we are going to make the following changes:

Change title to be more insightful
Add a subtitle for the plot
Add commentary to the visual
Change color of text for subplot/x-axes titles
Change the color of the Mobiles & Tablets category label
Add data source and author
Fix alignment of all text
Change the size of axes titles and labels
Improve how the hover label looks

Step 7: Add commentary — code | Image by author

Step 7: Add commentary — graph | Image by author

In this graph, I’ve tried something new with the title, using it as a piece of insight that also doubles as the commentary's start, using the ellipsis (…) to connect the title and commentary.

All of the text size and alignment changes in this iteration make the graph feel a lot cleaner. Adding commentary and changing the title to be more insightful takes this visual a long way in looking more professional.

The graph, to me, feels complete with a clear message conveyed to the reader with commentary that supports this message and provides further explanation around the finding.

Conclusion

The flexibility of Plotly Graph Objects is outstanding; so far, I’ve been able to customize any graph in the way that I’ve wanted. And when you’re trying to create stunning visuals, the more flexibility, the better, making Plotly an ideal package.

I find that customizing text with add_annotation(...) can be a laborious process. Particularly when it comes to alignment and positioning, but this is a small price to pay.

Hopefully, you found this tutorial on building professional bar graphs with Plotly in Python beneficial.

Hands-on Tutorials

Tutorial on Building a Professional Bar Graph in Plotly Python

A step-by-step guide on creating a high-quality bar chart

Before we start coding

Index

Step 1: Data manipulation

Step 2: Create a standard bar chart

Step 3: Styling changes

Step 4: Presenting multiple metrics

Step 5: Reducing the clutter

Step 6: Direct the reader’s attention

Step 7: Add commentary

Conclusion

Written by Tom Price