Plotly and Pandas: Combining Forces for Effective Data Visualization

A quick guide inspired by Storytelling with Data

Andreas Lukita
Towards Data Science

--

Photo by Luke Chesser on Unsplash

“My data visualization skill sucks. My audiences seemed unimpressed with my work, and worse, they are not convinced.”

There was once upon a time many of us bumped into this issue. Unless you are gifted or happened to take up a design course before, it could be quite challenging and time-consuming to produce visually aesthetic charts that are intuitive to the audiences at the same time.

Here is what crosses my mind back then: I want to be more intentional in crafting my charts such that it conveys information to my audience intuitively. By that, I mean not expending their brainpower and consuming their time excessively just to make sense of what is happening.

I used to think that switching from Matplotlib to Seaborn and finally to Plotly solve the aesthetics concern. Indeed, I was wrong. Visualization is not simply about aesthetics. Below are two visualizations I attempted to replicate from Storytelling with Data¹ by Cole Nussbaumer Knaflic that really inspire me to change my approach to visualization. They look clean, elegant, and purposeful. We are going to attempt to replicate these charts in this article!

Image by Author

Here is the takeaway of this post. If you are looking for an in-depth explanation of the concepts behind great visualization, check out Storytelling with Data¹, every page is a gem worth your time. If you are looking for tool-specific and practical advice, you are at the right place. Cole mentioned at the start of her book that the advice she presented is universal and tool-agnostic, although she admitted that she created the examples in the book using Excel. Some people, myself included, are not fans of Excel and drag-and-drop tools for many reasons. Some prefer creating visualizations using Python, R, and some other programming languages. If you are part of this segment and use Python as your primary tool, then this article is for you.

Table of Contents

Chaining—Pandas Plot

If you are somewhat of an expert or a seasoned player in using Pandas for data wrangling, you might encounter or even adopt the idea of “Chaining” yourself. In short, chaining allows your code to be a lot more readable, easier to debug, and production ready. Here is a simple example of what I am referring to. You don’t have to read line by line, just take a quick look to get the idea behind “Chaining”. Every step is clear and easy to explain, and the code is well organized with no unnecessary intermediary variables.

(epl_10seasons
.rename(columns=lambda df_: df_.strip())
.rename(columns=lambda df_: re.sub('\W+|[!,*)@#%(&$_?.^]', '_', df_))
.pipe(lambda df_: df_.astype({column: 'int8' for column in (df_.select_dtypes("integer").columns.tolist())}))
.pipe(lambda df_: df_.astype({column: 'category' for column in (df_.select_dtypes("object").columns.tolist()[:-1])}))
.assign(match_date=lambda df_: pd.to_datetime(df_.match_date, infer_datetime_format=True))
.assign(home_team=lambda df_: np.where((df_.home_team == "Arsenal"), "The Gunners", df_.home_team),
away_team=lambda df_: np.where((df_.away_team == "Arsenal"), "The Gunners", df_.away_team),
month=lambda df_: df_.match_date.dt.month_name())
.query('home_team == "The Gunners"')
)

This is great, but did you know that you can continue the chaining process to create basic visualization charts as well? Pandas Plot, by default, uses Matplotlib backend for this purpose. Let’s see how it works and reproduce some of the examples Cole created in her book.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
%matplotlib inline

pd.options.plotting.backend = 'plotly'

df = pd.DataFrame({"concerns": ["Engine power is less than expected",
"Tires make excessive noise while driving",
"Engine makes abnormal/excessive noise",
"Seat material concerns",
"Excessive wind noise",
"Hesitation or delay when shifting",
"Bluetooth system has poor sound quality",
"Steering system/wheel has too much play",
"Bluetooth system is difficult to use",
"Front seat audio/entertainment/navigation controls"
],
"concerns per 1,000": [12.9, 12.3, 11.6, 11.6, 11.0, 10.3, 10.0, 8.8, 8.6, 8.2],},
index=list(range(0,10,1)))

We have a DataFrame that looks like this.

Image by Author
(df
.plot
.barh()
)

This is the fastest way to generate a basic visualization chart. By chaining the .plot attribute and .line method directly from a DataFrame, we obtain the plot below.

Image by Author

Hold your reaction and judgment if you think the plot above does not pass the aesthetic check. Indeed, it looks ugly, to say the least. Let’s spice this up and do better. Here’s the trick, switch your Pandas plotting backend from Matplotlib to Plotly for the magic that is about to unravel.

pd.options.plotting.backend = 'plotly'

You might be questioning, “Why do I change it to Plotly? Isn’t Matplotlib capable of doing the same thing?” Well, here’s the difference.

If we use the Matplotlib backend in Pandas, it returns an Axes object, try verifying yourself using the built-in type() method. This is great because the axes object allows us to access methods to further modify our chart. Check out this documentation² for the possible methods to perform on Axes object. Let’s pick one to illustrate quickly.

(df
.plot
.barh()
.set_xlabel("concerns per 1,000")
)
Image by Author

We successfully set the x-axis label to “concerns per 1,000”, but in doing so, we return a Text object and lost our precious Axis object that allows us access to precious methods to further modify our chart. Too bad!

Here is an alternative to get around the limitation above,

(df
.plot
.barh(xlabel="Concerns per 1,000", ylabel="Concerns", title="Top 10 design concerns")
)
Image by Author

However, we are still unable to do extensive modification as the integration is quite limited by the Pandas' implementation here.

Plotly, on the other hand, does not return an Axes object. It returns a go.Figure object. The difference here is that the methods responsible for updating the charts also return you a go.Figure object, which allows you to continue chaining your method to further update your chart. Let’s try that out!

By the way, in case you are wondering how I get the combinations of methods and arguments below, they are all available on the official documentation here³.

Here are some important methods to get you started — .update_traces, .add_traces, .update_layout, .update_xaxes, .update_yaxes, .add_annotation, .update_annotations.

Horizontal Bar Chart

Let’s define a set of color palettes for our visualization below.

GRAY1, GRAY2, GRAY3 = '#231F20', '#414040', '#555655'
GRAY4, GRAY5, GRAY6 = '#646369', '#76787B', '#828282'
GRAY7, GRAY8, GRAY9, GRAY10 = '#929497', '#A6A6A5', '#BFBEBE', '#FFFFFF'
BLUE1, BLUE2, BLUE3, BLUE4, BLUE5 = '#25436C', '#174A7E', '#4A81BF', '#94B2D7', '#94AFC5'
BLUE6, BLUE7 = '#92CDDD', '#2E869D'
RED1, RED2, RED3 = '#B14D4A', '#C3514E', '#E6BAB7'
GREEN1, GREEN2 = '#0C8040', '#9ABB59'
ORANGE1, ORANGE2, ORANGE3 = '#F36721', '#F79747', '#FAC090'

gray_palette = [GRAY1, GRAY2, GRAY3, GRAY4, GRAY5, GRAY6, GRAY7, GRAY8, GRAY9, GRAY10]
blue_palette = [BLUE1, BLUE2, BLUE3, BLUE4, BLUE5, BLUE6, BLUE7]
red_palette = [RED1, RED2, RED3]
green_palette = [GREEN1, GREEN2]
orange_palette = [ORANGE1, ORANGE2, ORANGE3]

sns.set_style("darkgrid")
sns.set_palette(gray_palette)
sns.palplot(sns.color_palette())
Image by Author

Here, we want to highlight the concerns that are equal to and above 10 percent by defining a separate color.

color = np.array(['rgb(255,255,255)']*df.shape[0])
color[df
.set_index("concerns", drop=True)
.iloc[::-1]
["concerns per 1,000"]>=10] = red_palette[0]
color[df
.set_index("concerns", drop=True)
.iloc[::-1]
["concerns per 1,000"]<10] = gray_palette[4]

We then create the plot right from the DataFrame.

(df
.set_index("concerns", drop=True)
.iloc[::-1]
.plot
.barh()
.update_traces(marker=dict(color=color.tolist()))
)
Image by Author

Updating the layout yields the following. Here, we specify the template, add a title and margin to our plot, and specify the size of our figure object. Let’s comment on the annotations for the moment.

(df
.set_index("concerns", drop=True)
.iloc[::-1]
.plot
.barh()
.update_traces(marker=dict(color=color.tolist()))
.update_layout(template="plotly_white",
title=dict(text="<b>Top 10 design concerns</b> <br><sup><i>concerns per 1,000</i></sup>",
font_size=30,
font_color=gray_palette[4]),
margin=dict(l=50,
r=50,
b=50,
t=100,
pad=20),
width=1000,
height=800,
showlegend=False,
#annotations=annotations
)
)
Image by Author

Updating the x and y-axis properties yields the following.

(df
.set_index("concerns", drop=True)
.iloc[::-1]
.plot
.barh()
.update_traces(marker=dict(color=color.tolist()))
.update_layout(template="plotly_white",
title=dict(text="<b>Top 10 design concerns</b> <br><sup><i>concerns per 1,000</i></sup>",
font_size=30,
font_color=gray_palette[4]),
margin=dict(l=50,
r=50,
b=50,
t=100,
pad=20),
width=1000,
height=800,
showlegend=False,
#annotations=annotations
)
.update_xaxes(title_standoff=10,
showgrid=False,
visible=False,
tickfont=dict(
family='Arial',
size=16,
color=gray_palette[4],),
title="")
.update_yaxes(title_standoff=10,
tickfont=dict(
family='Arial',
size=16,
color=gray_palette[4],),
title="")
)
Image by Author

Last but not least, we will add some annotations to our chart. Here, we have a few annotations — adding data labels to the horizontal bar chart and footnote. Let’s do this together. First, we define the annotations on a separate cell.

annotations = []

y_s = np.round(df["concerns per 1,000"], decimals=2)

# Adding data labels
for yd, xd in zip(y_s, df.concerns):
# labeling the bar net worth
annotations.append(dict(xref='x1',
yref='y1',
y=xd, x=yd - 1,
text=str(yd) + '%',
font=dict(family='Arial', size=16,
color=gray_palette[-1]),
showarrow=False))

# Adding Source Annotations
annotations.append(dict(xref='paper',
yref='paper',
x=-0.72,
y=-0.050,
text='Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco'
'<br>laboris nisi ut aliquip ex ea commodo consequat.',
font=dict(family='Arial', size=10, color=gray_palette[4]),
showarrow=False,
align='left'))
(df
.set_index("concerns", drop=True)
.iloc[::-1]
.plot
.barh()
.update_traces(marker=dict(color=color.tolist()))
.update_layout(template="plotly_white",
title=dict(text="<b>Top 10 design concerns</b> <br><sup><i>concerns per 1,000</i></sup>",
font_size=30,
font_color=gray_palette[4]),
margin=dict(l=50,
r=50,
b=50,
t=100,
pad=20),
width=1000,
height=800,
showlegend=False,
annotations=annotations
)
.update_xaxes(title_standoff=10,
showgrid=False,
visible=False,
tickfont=dict(
family='Arial',
size=16,
color=gray_palette[4],),
title="")
.update_yaxes(title_standoff=10,
tickfont=dict(
family='Arial',
size=16,
color=gray_palette[4],),
title="")
)
Image by Author

Isn’t this a much better chart relative to the initial default version? Let’s continue to explore another popular chart — The Line Chart.

Just a heads up that the example below is more complicated than the one above. Nevertheless, the idea remains the same.

Line Chart

Let’s take a quick look at the default Matplotlib plotting backend for Line Chart.

pd.options.plotting.backend = 'matplotlib'

df = pd.DataFrame({"Received": [160,184,241,149,180,161,132,202,160,139,149,177],
"Processed":[160,184,237,148,181,150,123,156,126,104,124,140]},
index=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

(df
.plot
.line()
);
Image by Author

Let’s switch our plotting backend to Plotly!

pd.options.plotting.backend = 'plotly'

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
)

After switching the Pandas plotting backend to Plotly, the code above gives us the following. Here, we start by plotting only the Received Series.

Image by Author

Let’s update the line property by further chaining the method above. Here, we modify the color, width and placing markers at the data points.

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
.update_traces(go.Scatter(mode='lines+markers+text',
line={"color": gray_palette[4], "width":4},
marker=dict(size=12)),)
)
Image by Author

Let’s add the Processed Series to the chart!

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
.update_traces(go.Scatter(mode='lines+markers+text',
line={"color": gray_palette[4], "width":4},
marker=dict(size=12)),)
.add_traces(go.Scatter(x=df.index, #Add Processed col
y=df.Processed,
mode="lines+markers+text",
line={"color": blue_palette[0], "width":4},
marker=dict(size=12)))
)
Image by Author

Let’s add a vertical line at index May in order to show the point at which the two lines start to diverge.

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
.update_traces(go.Scatter(mode='lines+markers+text',
line={"color": gray_palette[4], "width":4},
marker=dict(size=12)),)
.add_traces(go.Scatter(x=df.index, #Add Processed col
y=df.Processed,
mode="lines+markers+text",
line={"color": blue_palette[0], "width":4},
marker=dict(size=12)))
.add_traces(go.Scatter(x=["May", "May"], #Add vline
y=[0,230],
fill="toself",
mode="lines",
line_width=0.5,
line_color= gray_palette[4]))
)
Image by Author

Next, let’s update the overall layout by changing the background to white, and adding the title, margin, and some other elements. For annotations, we will comment out at the moment.

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
.update_traces(go.Scatter(mode='lines+markers+text',
line={"color": gray_palette[4], "width":4},
marker=dict(size=12)),)
.add_traces(go.Scatter(x=df.index, #Add Processed col
y=df.Processed,
mode="lines+markers+text",
line={"color": blue_palette[0], "width":4},
marker=dict(size=12)))
.add_traces(go.Scatter(x=["May", "May"], #Add vline
y=[0,230],
fill="toself",
mode="lines",
line_width=0.5,
line_color= gray_palette[4]))
.update_layout(template="plotly_white",
title=dict(text="<b>Please approve the hire of 2 FTEs</b> <br><sup>to backfill those who quit in the past year</sup> <br>Ticket volume over time <br><br><br>",
font_size=30,),
margin=dict(l=50,
r=50,
b=100,
t=200,),
width=900,
height=700,
yaxis_range=[0, 300],
showlegend=False,
#annotations=right_annotations,
)
)
Image by Author

Next, we will perform an update to both the x and y-axes.

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
.update_traces(go.Scatter(mode='lines+markers+text',
line={"color": gray_palette[4], "width":4},
marker=dict(size=12)),)
.add_traces(go.Scatter(x=df.index, #Add Processed col
y=df.Processed,
mode="lines+markers+text",
line={"color": blue_palette[0], "width":4},
marker=dict(size=12)))
.add_traces(go.Scatter(x=["May", "May"], #Add vline
y=[0,230],
fill="toself",
mode="lines",
line_width=0.5,
line_color= gray_palette[4]))
.update_layout(template="plotly_white",
title=dict(text="<b>Please approve the hire of 2 FTEs</b> <br><sup>to backfill those who quit in the past year</sup> <br>Ticket volume over time <br><br><br>",
font_size=30,),
margin=dict(l=50,
r=50,
b=100,
t=200,),
width=900,
height=700,
yaxis_range=[0, 300],
showlegend=False,
#annotations=right_annotations,
)
.update_xaxes(dict(range=[0, 12],
showline=True,
showgrid=False,
linecolor=gray_palette[4],
linewidth=2,
ticks='',
tickfont=dict(
family='Arial',
size=13,
color=gray_palette[4],
),
))
.update_yaxes(dict(showline=True,
showticklabels=True,
showgrid=False,
ticks='outside',
linecolor=gray_palette[4],
linewidth=2,
tickfont=dict(
family='Arial',
size=13,
color=gray_palette[4],
),
title_text="Number of tickets"
))
)
Image by Author

Last but not least, we will add some annotations to our chart. Here, we have a few annotations—adding labels to the line chart (Received, Processed), as well as adding labels to scatter points, which could be a little complicated. Let’s do this together. First, we define the annotations on a separate cell.

y_data = df.to_numpy()
colors = [gray_palette[3], blue_palette[0]]
labels = df.columns.to_list()

right_annotations = []

# Adding labels to line
for y_trace, label, color in zip(y_data[-1], labels, colors):

right_annotations.append(dict(xref='paper',
x=0.95,
y=y_trace,
xanchor='left',
yanchor='middle',
text=label,
font=dict(family='Arial',size=16,color=color),
showarrow=False))

# Adding labels to scatter point
scatter_annotations = []

y_received = [each for each in df.Received]
y_processed = [float(each) for each in df.Processed]
x_index = [each for each in df.index]
y_r = np.round(y_received)
y_p = np.rint(y_processed)

for ydn, yd, xd in zip(y_r[-5:], y_p[-5:], x_index[-5:]):

scatter_annotations.append(dict(xref='x2 domain',
yref='y2 domain',
y=ydn,
x=xd,
text='{:,}'.format(ydn),
font=dict(family='Arial',size=16,color=gray_palette[4]),
showarrow=False,
xanchor='center',
yanchor='bottom',
))

scatter_annotations.append(dict(xref='x2 domain',
yref='y2 domain',
y=yd,
x=xd,
text='{:,}'.format(yd),
font=dict(family='Arial',size=16,color=blue_palette[0]),
showarrow=False,
xanchor='center',
yanchor='top',
))

After we define the annotations, we just need to put the annotation variable inside the chaining method like the following.

(df
.plot(x=df.index,
y=df.Received,
labels=dict(index="", value="Number of tickets"),)
.update_traces(go.Scatter(mode='lines+markers+text',
line={"color": gray_palette[4], "width":4},
marker=dict(size=12)),)
.add_traces(go.Scatter(x=df.index, #Add Processed col
y=df.Processed,
mode="lines+markers+text",
line={"color": blue_palette[0], "width":4},
marker=dict(size=12)))
.add_traces(go.Scatter(x=["May", "May"], #Add vline
y=[0,230],
fill="toself",
mode="lines",
line_width=0.5,
line_color= gray_palette[4]))
.update_layout(template="plotly_white",
title=dict(text="<b>Please approve the hire of 2 FTEs</b> <br><sup>to backfill those who quit in the past year</sup> <br>Ticket volume over time <br><br><br>",
font_size=30,),
margin=dict(l=50,
r=50,
b=100,
t=200,),
width=900,
height=700,
yaxis_range=[0, 300],
showlegend=False,
annotations=right_annotations,
)
.update_layout(annotations=scatter_annotations * 2)
.update_xaxes(dict(range=[0, 12],
showline=True,
showgrid=False,
linecolor=gray_palette[4],
linewidth=2,
ticks='',
tickfont=dict(
family='Arial',
size=13,
color=gray_palette[4],
),
))
.update_yaxes(dict(showline=True,
showticklabels=True,
showgrid=False,
ticks='outside',
linecolor=gray_palette[4],
linewidth=2,
tickfont=dict(
family='Arial',
size=13,
color=gray_palette[4],
),
title_text="Number of tickets"
))
.add_annotation(dict(text="<b>2 employees quit in May.</b> We nearly kept up with incoming volume <br>in the following two months, but fell behind with the increase in Aug <br>and haven't been able to catch up since.",
font_size=18,
align="left",
x=7.5,
y=265,
showarrow=False))
.add_annotation(dict(xref='paper',
yref='paper',
x=0.5,
y=-0.15,
text='Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco'
'<br>laboris nisi ut aliquip ex ea commodo consequat.',
font=dict(family='Arial',
size=10,
color='rgb(150,150,150)'),
showarrow=False,
align='left'))
.update_annotations(yshift=0)
.show()
)
Image by Author

Bonus: Number Chart

Congratulations on getting to this part of the article! Here is an additional plot for you to take away! Here we are creating a chart to aesthetically present a number by itself. In short, this is what I am referring to.

Image by Author

Since this is not a result of a DataFrame, we can start off by creating an empty go.Figure object from scratch, then gradually add on the annotations. We then end off by updating the layout accordingly.

(go
.Figure() # Create an empty figure
.add_annotation(
x=0.0,
y=1,
text='Pilot program was a success',
showarrow=False,
font={'size': 36, 'color': 'white'},
bgcolor=gray_palette[-3],
bordercolor='gray',
borderwidth=0,
xref='paper',
yref='paper',
xanchor='left',
yanchor='top',
align='left',
ax=0,
ay=-10
)
.add_annotation(
x=-1.0, # X-coordinate of the text position
y=3.0, # Y-coordinate of the text position
text="After the pilot program,", # The text content
showarrow=False, # Hide the arrow
font=dict(size=20,
color=blue_palette[1]), # Customize the font size
xanchor='left',
yanchor='top',
align='left',
)
.add_annotation(
x=-1.0, # X-coordinate of the text position
y=1.6, # Y-coordinate of the text position
text="<b>68%</b>", # The text content
showarrow=False, # Hide the arrow
font=dict(size=160,
color=blue_palette[1]), # Customize the font size
xanchor='left',
align='left',
)
.add_annotation(
x=-1.0, # X-coordinate of the text position
y=0.2, # Y-coordinate of the text position
text="<b>of kids expressed interest towards science,</b>", # The text content
showarrow=False, # Hide the arrow
font=dict(size=20,
color=blue_palette[1]), # Customize the font size
xanchor='left',
align='left',
)
.add_annotation(
x=-1.0, # X-coordinate of the text position
y=-0.2, # Y-coordinate of the text position
text="compared to 44% going into the program.", # The text content
showarrow=False, # Hide the arrow
font=dict(size=20,
color=gray_palette[-3]), # Customize the font size
xanchor='left',
align='left',
)
.add_annotation(
x=-1.0, # X-coordinate of the text position
y=-0.7, # Y-coordinate of the text position
text='Based on survey of 100 students conducted '
'before and after pilot program '
'(100% response rate on both surveys).', # The text content
showarrow=False, # Hide the arrow
font=dict(size=10.5,
color=gray_palette[-3]), # Customize the font size
xanchor='left',
align='left',
)
.update_layout(
xaxis=dict(visible=False), # Hide x-axis
yaxis=dict(visible=False), # Hide y-axis
margin=dict(l=0,
r=0,
b=0,
t=0,
pad=0),
font=dict(size=26,
color=gray_palette[-3]), # Customize the font size
paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)'
)
.show()
)

Afterword

There you have it! The key is to update and refine your plot step by step until you achieve a desirable result. Of course, every technique comes with its own limitations. If your chart is getting way too complicated to produce, it might be beneficial to refer to Plotly Express or even build everything from scratch using Plotly Graph Objects. It may seem difficult and unfamiliar to adopt this technique at first, but keep practicing and soon you will create beautiful visualization that makes sense!

If you pick up something useful from this article, do consider giving me a Follow on Medium. Easy, 1 article a week to keep yourself updated and stay ahead of the curve!

Connect With Me!

References

  1. Storytelling with Data by Cole Nussbaumer Knaflic. https://www.storytellingwithdata.com/books
  2. Matplotlib Axes API. https://matplotlib.org/stable/api/axes_api.html
  3. Plotly Graphing Libraries. https://plotly.com/python/reference/

--

--