The world’s leading publication for data science, AI, and ML professionals.

How to Create Interactive Visualizations with Plotly Express

A practical guide with many examples

Plotly Python (plotly.py) is an open-source plotting library built on plotly javascript (plotly.js). Plotly express is a high-level interface of plotly.py that allows us to create many interactive and informative visualizations.

In this post, we will go through many examples while increasing the level of complexity step-by-step. We will explore the effect of each feature/structure added to the visualizations.

If you don’t have plotly.py installed in your working environment, you can install using pip or conda:

$ pip install plotly==4.8.0
$ conda install -c plotly plotly=4.8.0

Let’s start by importing plotly express:

import plotly.express as px

For the examples, we will use two different datasets. One is the "telco customer churn" dataset available on kaggle. The other one is gapminder dataset which is available in plotly library. These built-in datasets of plotly come in handy for practicing.

Churn prediction is a common use case in Machine Learning domain. If you are not familiar with the term, churn means "leaving the company". It is very critical for a business to have an idea about why and when customers are likely to churn. Having a robust and accurate churn prediction model helps businesses to take actions to prevent customers from leaving the company. We will try to explore the dataset and have an understanding of the underlying structure of the dataset. The original dataset contains 20 features (independent variables) and 1 target (dependent) variable for 7043 customers. We will only use 7 features and the target variable in this post.

churn = pd.read_csv("Telco-Customer-Churn.csv")
churn = churn[['gender', 'Partner', 'tenure', 'PhoneService', 'InternetService', 'Contract', 'MonthlyCharges','Churn']]
churn.head()
churn.head()

We start with a basic box plot to check the distribution of monthly charges according to contract types:

fig = px.box(churn, x="Contract", y="MonthlyCharges")
fig.show()

The taller the boxplot, the more spread out the values are. This plot tells us the range of monthly charges is bigger for long term contracts. We can see the critical values of a box plot by hovering on the visualizations which are min, first quartile, median, third quartile, and max values.

We can use different colors for different groups with color parameter and also add an additional variable for comparison facet_col parameter.

fig = px.box(churn, x="Contract", y="MonthlyCharges", 
             color="Contract", facet_col='Partner')
fig.show()

It seems like having a partner does not change the contract type dramatically.

The scatter plots are also commonly used to understand the relationship among variables. For clear demonstration, I will take the first 200 rows of the dataset.

churn_filtered = churn.iloc[:200,:]

We can check the relationship between tenure and monthly charges and how this relationship changes according to contract type and having a partner. Tenure variable is the number of months that a customer has been a customer.

fig = px.scatter(churn_filtered, 
                 x="tenure", y="MonthlyCharges", 
                 color='Partner',
                 facet_col="Contract", facet_col_wrap=3)
fig.show()

facet_col creates subplots based on the specified variable. Facet_col_wrap parameters adjust the arrangement of subplots.

What this plot tells us is that customers without partners tend to have month-to-month contracts. Also, customers with partners are staying for a longer period (high tenure) with the company. This is a subset of the original dataset but, according to these 200 rows, company sells more month-to-month contracts than one-year or two-year contracts. Each point in the plot represents a customer and we can see the data by hovering on the point.

We can also confirm our intuition by checking the averages with groupby function:

churn_filtered[['Contract','Partner','tenure']].groupby(['Contract','Partner']).mean()

For each contract type, tenure is higher for customers with a partner. Also, the number of customers without a partner are more in month-to-month contract segment.

Let’s try to see the churn rate with respect to monthly charges, contract type, and tenure. We also add a title to the plot:

fig = px.scatter(churn_filtered, x="tenure", y="MonthlyCharges", 
                 color='Churn',
                 facet_col="Contract", facet_col_wrap=3,
                 title= "Churn Rate Analysis")
fig.show()

As we see on the plot above, it is highly unlikely that a customer with long term contract will churn (i.e. leave the company). If the company wants to stick with its customers, the priority should be signing long term contracts.

We can also add an indication of the distributions to the scatter plots using marginal_x and marginal_y parameters. Let’s plot the entire dataset this time and check if our sample with 200 rows is actually a good representation of the whole:

fig = px.scatter(churn, 
                 x="tenure", y="MonthlyCharges", 
                 color="Churn", 
                 marginal_y="rug", marginal_x="histogram")
fig.show()

Let’s first evaluate the x-axis. For tenures less than 10 months, red points (churn=yes) dominates. As tenure keeps increasing, blue points (churn=no) are becoming the dominant class. We also see that on the histogram above the scatter plot. It shows how the distribution of red and blue point change according to the position on x-axis. Most of the customers who churned have a tenure of fewer than 10 months.

The y-axis indicates monthly charges. The density of red points in the scatter plot increases as we go up in the y-axis (i.e. increasing monthly charges). This can also be seen on the rug plot on the right side of scatter plot. The density of horizontal lines is denser in the upper part. The density of blue points is more uniform compared to red points with the exception of the bottom part.


Ploty express provides many dataset to practice. We can easily load these datasets into a pandas dataframe. For instance, gapminder datasets includes gdp per capita of 142 countries for 12 years (not consecutively). The dataset also contains the life expectancy and the population of the countries in those years.

gap_df = px.data.gapminder()
gap_df.head()

Let’s plot the life expectancy and gdp per capita in 1952. Plotly express allows to filter the dataframe while creating the plot using query method:

fig = px.scatter(gap_df.query("year==1952"), 
                 x="gdpPercap", y="lifeExp",
                 hover_name="country",
                 color="continent", log_x=True
                 title="GDP vs Life Expectancy in 1952")
fig.show()

By setting hover_name parameter as "country", we are able see the name of the country on the points.

In general, countries in Africa have less gdp per capita and countries in Europe are in the top region in terms of gdp per capita. Kuwait is an outlier with a gdp per capita of more than 100K. You may have noticed the logarithmic scale on x-axis. We make it by setting log_x parameter as True which makes the plot looks much better. Without log_x parameter, the plot would look like this:

We can also use size parameter to represent one more variable in the plot. For instance, if we set size="pop", then the size of points becomes proportional to the population of countries.

fig = px.scatter(gap_df.query("year==1952"), 
                 x="gdpPercap", y="lifeExp",
                 hover_name="country",
                 color="continent", log_x=True,
                 size="pop", size_max=50,
                 title="GDP vs Life Expectancy in 1952")
fig.show()

We have covered some basic types of visualizations with plotly express. Of course, this is just a little of what can be done with this amazing library. There are many other plot types that we can dynamically create with plotly. Its syntax is easy to understand as well. I will try to cover more complex plots in the upcoming posts. You can also check the plotly documentation which I think is well-documented with many different examples. Just like any other topic, the best way to get familiar with plotly is to practice. Thus, I suggest creating lots of plots to sharpen your skills.

Thank you for reading. Please let me know if you have any feedback.


Related Articles