The world’s leading publication for data science, AI, and ML professionals.

Next level data visualization

Make charts that inspire / Part 01 : Towards tailor-made charts

The Complete Plotly Manual

In ancient Indian texts, explanation of philosophical concepts often begin by negating what that concept is not about. Utilizing a reoccurring phrase _neti neti_ (meaning neither this nor that), the idea is that telling what something is not, is at least as important as explaining the actual meaning of that concept/idea. Following in the footsteps of these ancient philosophers, let me begin by enumerating what this article is not about:

  • The article is not about how to quickly make Charts in plotly, often with just a single line of code as in the case of plotly express. If that is what you are interested in, please follow this amazing medium article by Will Koehrsen.
  • This article is also not about listing all the different chart types that are available for data visualization. If that is what you are looking for, check out this extremely informative article Samantha Lile. In fact this article discusses only two different chart types : line and scatter.

Introduction

Any Data Analysis project has two essential goals. First, to curate data in readily interpretable form, uncover hidden patterns, and identify key trends. Second, and perhaps more important, is to effectively communicate these findings to the readers through thoughtful data visualization. This is an introductory article on how to begin thinking about customized visualizations that readily disseminate key data features to the viewer. We achieve this by moving beyond the one line charts that have made plotly so popular among data analysts and focusing on individualised chart layouts & aesthetics.

All code used in this article is available on Github. All charts presented here are interactive and have been rendered using jovian, an incredible tool for sharing and managing jupyter notebooks. This medium article by Usha Rengaraju contains all the details on how to use this tool.

source
source

Plotly

Plotly is a natural library of choice for Data Visualization because its easy to use, well documented and allows for customization of charts. We begin by briefly summarizing plotly architecture in this section before moving to visualizations in the subsequent sections.

While most people prefer using the high level plotly.express module, in this article we will instead focus on use of theplotly.graph_objects.Figureclass to render charts. And while there is extensive documentation available on the plotly website, the material can be a bit overwhelming for those new to visualization. I therefore endeavour to provide a clear and concise explanation of the syntax.

The Plotly graph_objects that we will make use of, are composed of the following three high-level attributes and plotting a chart essentially involved specifying these:

  • data attribute includes selection of the chart type from over 40 different types of traces like [scatter](https://plotly.com/Python/line-and-scatter/), [bar](https://plotly.com/python/bar-charts/), [pie](https://plotly.com/python/pie-charts/), [surface](https://plotly.com/python/3d-surface-plots/), [choropleth](https://plotly.com/python/choropleth-maps/) etc and passing the data to these function.
  • layout attribute controls all the non-data related aspects of the chart like text font, background color, axis & tickers, margins, title, legend etc. This is the attribute we will spend a considerable time manipulating to make changes like adding an additional y-axis or plotting multiple charts in a figure when dealing with large datasets.
  • frames is used to specify the sequence of frames when making animated charts. Subsequent articles in this series will make use of this attribute extensively.

For most of the charts we make in this article, following three are the standard libraries that we will use:

For those new to python, check out my earlier article on using pandas for data data wrangling. Some of these tools will be used to extract and transform data for visualization.

A Cheatsheet for Data Wrangling using Pandas

The rest of this article is divided into the following sections:

  1. The Line Chart
  • The basic line chart
  • The custom-made line chart
  • When not to use a line chart

2. The Scatter Plot

  • The basic scatter plot
  • Charts with a drop down menu
  • Scatter plot matrix

The Line Chart

The Basic Line Chart

What can be more routine than a line chart. It is one of the first things that comes to mind when thinking about data visualization. We begin by making use of the gapminder dataset to render a line chart with many data points. For those not familiar with this data visualization classic, check out their website and this animation based on gapminder data which captures almost 200 years of world history in under 5 minutes.

First we make the line chart using a single line of plotly.express code, where plotting essentially involves specifying the x and y variables, the plot title and variable to be used to color code the data.

Now let’s make the same chart using the plotly graph_objects.

We first initialize an empty go figure object using fig1 = go.Figure(). We then plot a line chart by adding a trace fig1.add_trace for each country using the go.Scatter class from plotly.graph_objects. This class can be used to make both line and scatter charts by varying the mode argument which can take any combination of "lines", "markers" &"text" joined by a +. The add_trace works to provide the data arguments to the go.Figure() object. In additions, we can also provide layout arguments that we will do for subsequent charts and frame arguments that we will cover in subsequent articles.

This is the basics line chart. We have plotted it using both plotly.express with just a single line of code and with a slightly longer code using graph_objects that essentially does the same this.

Does this chart not look dreadful? Am I the only one who feels that way? Are you not sick of seeing such charts everywhere?

The reason we made this chart using the plotly graph_objects class is because it provides a great deal of flexibility when deciding the data and layout (& frame) attributes for customizing charts.

The Custom-made Line Chart

As far a data visualization aesthetics is concerned, **** I have been an admirer of Nathan Yau’s work in Flowing Data. Check out for instance this chart and then compare it with what we just made. To move away from the world of default setting and towards such tailor made charts is the ultimate aim I have in mind for this and subsequent articles in this series.

Lets start customising that godawful chart.

The Color Scheme. All data is either discrete and continuous. Discrete datasets are those where the individual datapoints are independent of each other. In the GDP dataset that we just visualized, the GDP of each country is independent of the GDP of other country and we must therefore use colors from the px.colors.qualitative module to visualize it. Check out plotly documentation for a complete list of all available colorsets in this module. All colors in such a color set are of different hue but similar saturation and value.

Continuous datasets have values that vary over a range like this map of unemployment rate. We use sequential color schemes to visualize such data where the magnitude of data is mapped to the intensity of color. Subsequent articles in this series will focus more on color and how to effectively use it for data visualization.

Back to customizing our line chart. We use the cycle tool to interate over the different colors present in the Pastel color palette using palette = cycle(px.colors.qualitative.Pastel). Every time we plot a data point we select one of these colors using marker_color=next(palette).

The Axis. Let’s start formatting the axis by getting rid of the rectangular grid and putting ticks outside the axis of length 10 (showgrid=False, ticks="outside", tickson="boundaries", ticklen=10). We also get rid of the y-axis and have a thick black x-axis (showline=True, linewidth=2.5, linecolor='black' ). All these arguments are passed to update_xaxes and update_yaxes.

The Layout. Finally, we specify several parameters to define the overall layout of the chart, namely :

  • Text font style, size, and color.family="Courier New, monospace", size=18, color="black"
  • Overall chart dimensions. width=1000, height=500
  • Chart background and paper color as white.plot_bgcolor='#ffffff', paper_bgcolor = '#ffffff'. Check out this online tool for converting any color of choice into the hex format for using in python. '#ffffff' is the hex code for white
  • Title and its horizontal position. title='GDP per capita of European Countries <br> 1952-2007', title_x=0.4.
  • x and y axis labels. xaxis=dict(title='Year'), yaxis=dict(title='GDP per capita in USD').

Putting all these together, this is what we get :

This chart might not be quiet in the same league as some of Nathan Yau’s work but its still a lot better than the default plotly charts. This chart code also represents a basic backbone for all other charts we will plot in this article. A few key variables in the chart above can be easily altered to change the overall aesthetics of the plot. These include the plot and paper background colors (plot_bgcolor='#ffffff' & paper_bgcolor = '#ffffff'), the colorset for rendering data (px.colors.qualitative.Pastel) and the axis layout (update_xaxes & update_yaxes). Do take time to play around with these.

When not to use a line chart

Let’s now make a chart similar to the one above but for a different dataset.

As the number of variables increase, line charts become difficult to interpret as colors corresponding to different variables can be hard to distinguish or atleast require some effort. One option is to use the stacked area plot. Just add stackgroup='one' to the add_trace function in the code above to get a stacked area plot for the same dataset.

For comparing different data points, area plots are prefereable over line plots. In the chart above, we can not only see how much CO2 a country emits but also compare how it preforms with respect to other countries. China emits slightly more CO2 than USA + EU-28 combined is a conclusion we can immediately draw from the stacked area chart while the same would be difficult to interpret from a simple line chart.


The Scatter Chart

The Basic Scatter Plot

While the line chart renders the progress of a variable over time, scatter charts can be used to plot variation between two or more variables that may or may not be related. We begin by plotting a openly available dataset of two indices (GCAG and GISTEMP) measuring the mean surface temperature change since 1880. We use the go.Scatter class of plotly.graph_objects with mode='markers' this time. The overall backbone of the code is much the same as before.

Now lets add another variable to it, the average sea level data for (roughly) the same duration. Turns out that while the two temperature indices vary from -0.5 to 1.5, the sea level varies from 0 to 10. If we were plot the two on the same axis, the temperature curves will essentially become flat. So we instead initialize a figure with two different y-axis usingfig = make_subplots(specs=[[{"secondary_y": True}]]). Now while adding each trace to this figure with fig.add_trace, we must specify which y-axis to plot it by providing an additional argument secondary_y.

Charts with drop down menus

Now what if there are many more variable we want to plot on the same chart. We could make use of drop down menus to select individual variables to be displayed one at a time. To do this we need to do the following:

  1. Add a new argument to the layout which includes a list of all buttons to be added buttons= list_updatemenus and the location of this drop down menu x=1.2,y=0.8. updatemenus=list([dict(buttons= list_updatemenus, x=1.2,y=0.8)])
  2. Specify the list of options for the drop down menu using list_updatemenus. These include:
  • label: name of variable that appears in the drop down menu
  • method: decides how the chart will be modified when a particular option is selected from the drop down menu. It could be one of the following: restyle(which modifies data), relayout( modifies layout), update (to modify both data & layout), and animate (to begin or end an animation).
  • args: this includes (1) visible a list which specifies which data set will be plotted in the form of a list of booleans. The size of this list is same as the number of traces added to this figure. 'visible': [True, False] means that the first of the two go.Scatter plot will be displayed. (2)'title' which is the title displayed on the top when that variable is plotted.

Putting all of this together we get the following :

Try using the drop down menu to select individual variables.

The Scatter Matrix

Using the drop down menu may not always be an idea options as often we want to visualize several variables at the same time to uncover possible relationship between them. We can do this using a scatter matrix plot, something routinely used as part of exploratory data analysis.

The weather dataset that we will be using records several variables like maximum & minimum temperature, wind direction & speed, amount of sunlight and relative humidity. For each data point we also know the outcome i.e. if it rained or not and the amount of rain. The goal now is to visualize how the rain outcome and amount were influenced by all these variables. We first assign labels to the outcomes variables (0 means no rain and 1 means rain). We then plot all these variables over a matrix using a go.Splom module. Again the basic code backbone remains the same.

Notice that instead of specifing a color palette, we use a colorscale instead colorscale='temps'. Check out the plotly documentation for a list of all available colorscales. The showupperhalf argument when set to true will result in a complete matrix with essentially every variable plotted twice, one below and one above the diagonal.

Conclusion

In this article we discussed how to plot a basic line chart and when to replace it with a area plot. We then discussed how to plot scatter charts and when to replace them with scatter matrix plots. In do so we discussed how to customized the plotly graph_objects to generate charts that are tailor made to our requirements. And we achieved this customization by making minor changes to a basic code backbone.

Subsequent articles in this series will focus on plotting maps, on thoughtful use of color and of animations for advanced data visualization. Thanks for reading. Please do share your feedback.


Related Articles