Plotly Express Yourself

A quick stroll through basic visualizations using Plotly Express, in Python.

Published in

Towards Data Science

9 min readMay 13, 2019

I’ve recently discovered Plotly Express, and I am very excited to add it to my toolbox, and for it’s potential.

For those who are unfamiliar with Plotly (or even more confused why there would be a need for an express version), let’s get you up to speed.

Plotly is a visualization library available on a number of platforms, including Java, Python, and R (where my own past personal, albeit limited experience with Plotly lies) as well as through Plotly’s (the organization) commercial product. Plotly is awesome because you can create highly interactive visualizations, as opposed to static visualizations via matplotlib or seaborn (matplotlib’s cooler sibling). For example:

https://mangrobang.shinyapps.io/Project_Draft_AV/

But the downside (at least for me) was that getting anything done in Plotly always felt like a lot of lines of code. For example, the code for the preceding plot looks like this (some data set up not included for brevity):

p <- plot_ly(draft
                 , x = ~Rnd
                 , y = ~First4AV
                 , type = "scatter"
                 , mode = "markers", color = ~draft$Position.Standard
                 , colors = brewer.pal(8, "Spectral")
                 , marker = list(size = 10, opacity = .25 )
    ) %>%
      # Plot the average for each draft round
      layout(title = "Accumulated AV over First 4 Years of Player Careers by Draft Round, Yrs 1994-2012") %>%
      layout(xaxis = xx, yaxis = yy) %>%
      add_trace( data=avg.4.AV, x = ~Rnd, y = ~avg.F4.AV, type = "scatter"
              , name = "Avg 4-Yr AV", color=I("lightpink"), mode="lines" ) %>%
      # Plot the predicted value
      add_trace(x = input.rnd, y = mod.draft.pred(), type = "scatter"
              , mode = "markers"
              , marker = list(symbol='x',size=15, color='black')) %>%
      layout(annotations = jj) %>%
      layout(annotations = pr)p

Note that the above is written in R, not Python, because that’s where most of my past experience with Plotly has been. But what I’ve seen and used in Python seems similar in syntactical complexity.

What Plotly Express does, is create an even higher-level wrapper around the base Plotly code, so the syntax is simpler. To me, that means easier to learn and easier to build coding muscle memory. Less copying and pasting code from old projects or Stack Overflow!

To put it through the paces, I thought it would be useful to run through some standard exploration plots. When we’re done, we’ll hopefully end up with a handy cheat sheet for Plotly Express!

Prep! (and prefacing thoughts)

For our test data I found this fun dataset on Kaggle on superheroes (hey, I just saw Avengers:Endgame!):

Multiple Spider-Men and Captains America? Yes, the multiverse exists!

2. Code for getting and scrubbing the data, as well as the snippets below can be found in this jupyter notebook here.

3. If you haven’t installed it or imported yet:

# Install
pip install plotly_express# Import
import plotly_express as px

4. We’ll assume you have some conceptual familiarity with the plots shown below, so we won’t be deep-diving into the pros and cons of each. But we’ll also add some thoughts and references when available.

5. The look of the following plots vary wildly because I was curious as to the look and feel of the various built-in templates (i.e. themes). So, apologies if that’s annoying, but like I mentioned, I’m trying to flex all of Plotly Express’ muscles.

Plotting Univariate Data

Let’s cover a couple of classic ways to perform univariate exploration of continuous variables: Histograms and Boxplots.

Histogram (Univariate):

px.histogram(data_frame=heroes_clean
     , x="Strength"
     , title="Strength Distribution : Count of Heroes"
     , template='plotly'
     )

You can read more about how cool histograms are here.

Boxplot (Univariate):

…And the boxplot. So elegant in its simplicity.

px.box(data_frame=heroes
    , y="Speed"
    , title="Distribution of Heroes' Speed Ratings"
    , template='presentation'
    )

More can be found on boxplots here, here, and here. But if you find boxplots to be a bit square, perhaps a violin plot will do?

Violin Plot

px.violin(data_frame=heroes
          , y="Speed"
          , box=True
          , title="Distribution of Heroes' Speed Ratings"
          , template='presentation'
         )

Violin plots are becoming increasingly popular. I like to think of them as boxplot’s cooler, better-looking sibling. Ouch.

But what if the variable or feature you want to explore is categorical, not continuous? In this case, you’ll probably want to start with a bar chart to get a feel for counts of values.

Bar Chart (Univariate)

px.bar(data_frame=heroes_publisher
       , x='publisher'
       , y='counts'
       , template='plotly_white'
       , title='Count of Heroes by Publisher'
      )

Here’s a quick primer on bar charts.

Univariate analysis is all well and good, but really, we usually want to compare variables to other variables to try to tease out interesting relationships, so we can build models. So let’s keep building our plotly-express superpowers on some examples of bivariate techniques.

Plotting Bivariate Data

Let’s start with comparing continuous variables versus continuous variables.

Scatter Plot

px.scatter(data_frame=heroes
           , x="Strength"
           , y="Intelligence"
           , trendline='ols'
           , title='Heroes Comparison: Strength vs Intelligence'
           , hover_name='Name'
           , template='plotly_dark'
          )

If a theoretical character has 0 Strength, they at least rate 57 in Intelligence. Hmm.

Scatter plots are the tried and true way of comparing two continuous (numeric) variables. It’s a great way to quickly assess whether a relationship exists between the two variables.

In the example above, we further give ourselves a helping hand at spotting a relationship by adding a trendline. It appears that there is a weak positive correlation between Strength and Intelligence.

Line Plot

px.line(data_frame=heroes_first_appear_year
        ,x='Year'
        ,y='Num_Heroes'
        ,template='ggplot2'
        ,title="Number of Heroes by Year of First Appearance"
        ,labels={"Num_Heroes":"Number of Heroes"}
       )

The early ’60s was a big turning point in comic superheroes.

A special case of continuous versus continuous comparison are time series. The classic way to do this is with a line plot. Almost always the date/time variable will be along the x-axis while the other continuous variable is measured along the y-axis. And now you can see how it changed over time!

What if we want to compare categorical versus continuous variables? Well, it turns out that we can just use univariate techniques, but just “repeat” them! One of my favorite ways is using a stacked histogram. We can make a histogram for our continuous variable, for each value of a categorical variable, and then just stack them!

For example, let’s revisit our histogram from prior, on Strength, but this time we'd like to see the data separated out by Gender.

Stacked Histogram

px.histogram(data_frame=heroes[~heroes.Gender.isna()]
             , x="Strength"
             , color='Gender'
             , labels={'count':'Count of Heroes'}
             , title="Strength Distribution : Count of Heroes"
             , template='plotly'
            )

I’m guessing the big bar for 10–19 is non-superpowered characters, like Batman. Nerd.

Maybe the stacks are confusing to you and just want to see the bars grouped by bins:

Stacked Histogram (grouped bins)

px.histogram(data_frame=heroes[~heroes.Gender.isna()]
             , x="Strength"
             , color='Gender'
             , barmode = 'group'
             , labels={'count':'Count of Heroes'}
             , title="Strength Distribution : Count of Heroes"
             , template='plotly'
            )

…Or if either of those looks were too visually busy for you, then maybe you just want a plot for each category value. You’ll see this sometimes called faceting (or at least that’s what I’ve come to call it).

Faceted Histograms

px.histogram(data_frame=heroes[~heroes.Gender.isna()]
             , x="Strength"
             , color='Gender'
             , facet_row='Gender'
             , labels={'count':'Count of Heroes'}
             , title="Strength Distribution"
             , template='plotly'
             )

Wow, I’m histogrammed out. Let’s look at applying the same faceting/splitting concept to box plots.

Split Box Plot

px.box(data_frame=heroes[~heroes.Gender.isna()]
        , y="Speed"
        , color="Gender"
        , title="Distribution of Heroes' Speed Ratings"
        , template='presentation'
        )

And whatever box plots can do, violin plots can as well!

Split Violin Plot

px.violin(heroes[~heroes.Gender.isna()]
        , y="Speed"
        , color="Gender"
        , box=True
        , title="Distribution of Heroes' Speed Ratings"
        , template='presentation'
       )

‘Agender’ characters have higher median (and likely mean) Speed.

So what about if you want to just compare categorical versus categorical values? If that’s the case you usually want to look at relative counts. So stacked bars are a good way to go:

Stacked Bar Chart (Categorical vs Categorical)

px.histogram(data_frame=heroes
             ,x="Publisher"
             ,y="Name"
             ,color="Alignment"
             ,histfunc="count"
             ,title="Distribution of Heroes, by Publisher | Good-Bad-Neutral"
             ,labels={'Name':'Characters'}
             ,template='plotly_white'
            )

Marvel and DC Comics are pretty top heavy with ‘Good’ characters.

Digression: It turns out that stacked bar charts are way easier using .histogram since it gives access to histfunc, which allows you to apply a function to the histogram. This saves steps from having to aggregate first (which you may have noticed was done for the bar chart above).

Plotting Three or More Variables

We may be sensing a pattern here. We can turn any univariate visualization into a bivariate one (or more) by using another visual element, such as color; or by faceting/splitting along category values.

Let’s explore adding a third variable. A common technique is to add a categorical variable to a scatter plot using color.

Colored Scatter Plot

px.scatter(data_frame=heroes[~heroes.Gender.isna()]
           , x="Strength"
           , y="Intelligence"
           , color="Alignment"
           , trendline='ols'
           , title='Heroes Comparison: Strength vs Intelligence'
           , hover_name='Name'
           , opacity=0.5
           , template='plotly_dark'
           )

Similar relationships across Alignments.

Maybe this data is not that interesting with the added category, but categories really stand out when you find the right pattern, such as with the classic iris data set…like this:

But going back to our original scatter plot with color, what if we wanted to add on a third continuous variable? How about if we tied it to the size of our markers?

Scatter Plot, with Color and Size

Below we add the continuous Power variable as the size of the markers.

px.scatter(data_frame=heroes[~heroes.Gender.isna()]
           , x="Strength"
           , y="Intelligence"
           , color="Alignment"
           , size="Power"
           , trendline='ols'
           , title='Heroes Comparison: Strength vs Intelligence'
           , hover_name='Name'
           , opacity=0.5
           , template='plotly_dark'
          )

Wow, Galactus is tops in `Strength`, `Intelligence`, and `Power`!

One thing I noticed is that the legend doesn’t automatically add a legend for Size. That’s a little annoying. What can I say, Plotly Express has already spoiled me over the course of this post!

We’ve barely begun to scratch the surface of what’s possible, based on what I’ve seen in the documentation. We can go on and on, but let’s end our exploration on a couple more examples.

Scatter Matrix

Scatter matrices perform pair-wise scatter plots on a set of continuous variables, which you can then customize with colors, symbols, etc. to stand for categorical variables.

px.scatter_matrix(data_frame=heroes[~heroes['Gender'].isna()]
                  , dimensions=["Strength", "Speed", "Power"] 
                  , color="Alignment"
                  , symbol="Gender" 
                  , title='Heroes Attributes Comparison'
                  , hover_name='Name'
                  , template='seaborn'
                 )

Maybe a future release will have the option to toggle the diagonal plots into a histogram (or some other univariate plot).

Scatter with Marginal Plots

That was neat, but this next one I really like for its simplicity. The idea is you can add any of the univariate plots we’ve covered to the margins of a scatter plot.

px.scatter(data_frame=heroes[~heroes.Gender.isna()]
           , x="Strength"
           , y="Speed"
           , color="Alignment"
           , title='Strength vs Speed | by Alignment'
           , marginal_x='histogram'
           , marginal_y='box'
           , hover_name='Name'
           , opacity=0.2
           , template='seaborn'
          )

Whew! That…was a lot. But I think it’s a good start on creating a quick reference to the more common plotting techniques, all using plotly express. I’m really digging what I’ve seen so far (everything we’ve done in this post are technically one-liners!) and look forward to their future updates! Thanks for reading.

Please feel free to reach out! | LinkedIn | GitHub

Sources:

https://www.plotly.express/

plotly. Introducing Plotly Express, 20 Mar 2019, https://medium.com/@plotlygraphs/introducing-plotly-express-808df010143d. Accessed 11 May 2019.