The world’s leading publication for data science, AI, and ML professionals.

Learn Plotly for Advanced Python Visualization: A Use Case Approach

A Hands-on Guide to Create an Interactive Scatter Bubble Chart with Plotly Go

Image Source (Pixabay)
Image Source (Pixabay)

Introduction

I recently completed an interesting Data Science project which used unsupervised machine learning to segment/cluster162 neighborhoods in North Carolina based on several key housing market indicators. You can read my article Housing Market Cluster Analysis if you are interested in learning more about the project.

At the end of the project, I used Tableau to plot a scatter bubble chart to visualize the segments. Scatter bubble plot is an excellent visualization technique for use cases like visualizing segmentation results. It allows you to project your clusters onto ‘3 dimensions’ – your x and y axes for the first two dimensions, and bubble size for the third. Below is the screenshot of the Tableau dashboard I created for the project.

Scatter Bubble Chart Plotted in Tableau (Image by Author)
Scatter Bubble Chart Plotted in Tableau (Image by Author)

Although Tableau is a fantastic tool for creating interactive visualizations, it is a bit of hassle to toggle between two platforms: training models and creating segments in Python environment, and then visualizing results in another tool. Therefore, in this tutorial, I’d like to show you how to use Python’s Plotly Go (Graph Objects) to create the same interactive chart with customized colors and tooltips that can meet our needs in this use case.


Plotly Express vs. Plotly Go

The Plotly Python library is an interactive, open-source plotting library that covers a wide range of chart types and Data Visualization use cases. It has a wrapper called Plotly Express which is a higher-level interface to Plotly.

Plotly Express is easy and quick to use as a starting point for creating the most common figures with the simple syntax but lacks functionality and flexibility when it comes to more advanced chart types or customizations.

In contrast to Plotly Express, Plotly Go (Graph Objects) is a lower-level graphing package that generally requires more coding but is much more customizable and flexible. In this tutorial, we’ll use Plotly Go to create the same interactive scatter bubble chart like the one we created in Tableau. You can also save the coding as your template for creating similar charts in other use cases.


Read and Prepare the Data

At the end of the Housing Market Cluster Analysis project, we created a data frame that has the housing market metrics for 162 neighborhoods in NC as well as their assigned clusters (cluster_nbr) from the k-means algorithm. Let’s read this data and take a peek at what the data looks like:

Image by Author
Image by Author

We have 162 rows and 17 fields. The ‘cluster_nbr’ field is the cluster label we assigned to each neighborhood based on the k-means algorithm. The last three fields are principal components derived from PCA, which will allow us to project our clusters onto a scatter chart, with the first two principal components being x and y axes.

Let’s also rename a few columns to make them easier to understand. Also, the cluster_nbr column is an integer, which will not work later on in our code to produce the desired output. Therefore, we’ll need to change its data type to string.

Cluster Data (Image by Author)
Cluster Data (Image by Author)

Add Cluster Description

Now we have read and prepared our cluster data, let’s do a quick analysis to show the summary statistics at the cluster level. This will help us understand each cluster’s distinct characteristics and add a meaningful description for each cluster (instead of generic cluster labels such as 0,1,2,3 etc.)

Cluster Summary Statistics (Image by Author)
Cluster Summary Statistics (Image by Author)

Based on the summary statistics, we can describe the characteristics of each cluster. For example, cluster ‘1’ has the shortest median days-on-market and the second-highest year-over-year price increase whereas cluster ‘2’ is a relatively pricy market with the highest median sales price but the largest price drop compared to last year. It also takes the longest for the houses in this cluster to sell. Based on our observations, we can add a cluster description column to our dataset:

Image Provided by Author
Image Provided by Author

Create A Basic Scatter Plot Using Plotly

To get familiar with Plotly, let’s first create a standard and simple scatter plot with just a couple of lines of code as shown below.

The basic scatter plot projects the 162 neighborhoods onto a 2-dimensional chart. The X-axis is the first principal component (PC1) which represents the selling speed/days-on-market metric. The Y-axis is the second principal component (PC2) which represents the supply/new listings YoY metric.

Basic Scatter Plot using Plotly (Image by Author)
Basic Scatter Plot using Plotly (Image by Author)

When hovering over each data point, the plot shows a tooltip with x and y coordinates by default. With just a couple of lines of code, we have made a basic interactive chart which is quite convenient. Now we can add more features and customizations to the chart and make it more informative. Specifically, we will customize and refine the plot by doing the following things:

  • Show data points by customized colors that represent different clusters
  • Add bubble size that represents the ‘median sales price increase (YoY)’ for each data point
  • Customize hover-over tooltip to show additional information about each data point
  • Add chart title, x and y-axis labels, etc.

Add Customizations to the Basic Scatter Chart

In order to add customizations such as cluster colors, bubble sizes, and hover-over tips, we need to first add three new columns to our data frame that assign these ‘customization parameters’ to each data point.

The following code will add a new column called ‘color’ to the data frame. We first define a function called ‘color’ which assigns a unique color code (specified by us) to each cluster_nbr. Then we apply this function to each data point in the data frame so that every data point will have its own color code depending on which cluster it belongs to.

Add 'color' column to the data frame (Image by Author)
Add ‘color’ column to the data frame (Image by Author)

We’ll also add a new column called ‘size’ to each data point that is to show the size of each bubble. We want the bubble size to represent the median sales price increase YoY: the bigger the bubble, the larger increase of the sales price compared to the same period last year.

Some data points have negative values for the median price change variable and you will get an error when trying to use this variable directly for plotting bubble sizes. Therefore, let’s use the min-max scaler to scale this variable and make it fall between 0 and 1 with all positive values.

Add size 'column' for bubble size (Image by Author)
Add size ‘column’ for bubble size (Image by Author)

Lastly, let’s add a ‘text’ column which will enable us to show customized tooltips when hovering over each data point. This can be achieved by using a for loop shown in the code below.

Add 'text' column for customized tooltips (Image by Author)
Add ‘text’ column for customized tooltips (Image by Author)

Put All the Customizations Together

Now we have our customization columns added to the data frame, we can create a figure and add those customizations to the figure.

In the code below, we first create a dictionary that contains the data frame for each cluster. We then create a figure and add the first trace which uses the first cluster’s data fame. We loop through the dictionary and add the rest of the traces (clusters) one at a time to the figure and eventually plot all the clusters in the same figure.

Bubble chart with customized colors and tooltips (Image by Author)
Bubble chart with customized colors and tooltips (Image by Author)

Style and Format the Chart

We accomplished the goal of customizing the basic scatter chart with different colors by segment, customized text in tooltips as well as specified bubble sizes. We notice that in the tooltip some metrics are shown in many decimal places which are hard to read. Some metrics may be better shown in percentages. So let’s make those styling changes to make the tooltips easier to understand.

We’ll also add a chart title and axis labels to the chart. Remember our x and y axes are two principal components that represent selling speed and supply/new listings YoY so let’s label the axes based on those meanings.

Interactive Scatter Bubble Chart (Image by Author)
Interactive Scatter Bubble Chart (Image by Author)

Now we have created an interactive scatter bubble chart in Python with Plotly! Originally I thought this should be a pretty easy chart to make. But as you can see, when you are actually working on a real-world use case, there is a lot of details and nuances to think about and take care of in your coding to make the chart look exactly like what you have in your mind! Thanks for reading and I hope you enjoyed this article. Happy Learning!


You can unlock full access to my writing and the rest of Medium by signing up for Medium membership ($5 per month) through this referral link. By signing up through this link, I will receive a portion of your membership fee at no additional cost to you. Thank you!


Related Articles