Scatterplots are extremely useful for visualizing relationships between two sets of numerical variables. It is even more insightful when it is interactive with zooming and brushing capabilities.

The scatter plot is perhaps the most well-known chart to visualize numerical variables. Such basic charts are very useful from time to time, especially when they are interactive with brushing and zooming capabilities. In this blog post, I will demonstrate how to create interactive scatter plots with varying colors, sizes, tooltips, and sliding data points between two or three sets of coordinates. The scatter chart is part of the D3Blocks library and can be created using Python. The back end is developed in D3js and the output is encapsulated into a single HTML. This allows you to easily embed the chart into a larger framework or you can share/publish it directly for which only an internet browser is required.
If you found this article helpful, use my referral link to continue learning without limits and sign up for a Medium membership. Plus, follow me to stay up-to-date with my latest content!
Scatterplot is part of D3Blocks.
[D3Blocks](https://towardsdatascience.com/d3blocks-the-python-library-to-create-interactive-and-standalone-d3js-charts-3dda98ce97d4) is a library that contains various charts for which the visualization back-end is built on (d3) javascript but configurable using Python. In this manner, the D3Blocks library combines the advantages of d3-javascript such as speed, scalability, flexibility, and unlimited creativity together with Python for fast and easy access to a broad community such as the Data Science field. The output of each chart in D3Blocks, such as the Scatterplot chart, is encapsulated into a single HTML file. This makes it very easy to share or publish on websites or to embed it in dashboards. Moreover, it does not need any other technology than a browser to publish or share the graphs. More information about D3Blocks can be found in this blog [1].
Reasons to create a Scatterplot.
A Scatterplot is ideal to visualize numerical variables and observe the nature of such a relationship. As an example, Scatter Plots can help to show the strength of the linear relationship between two variables (such as for correlation) but can also be used as quality control, for exploration purposes, or to get a better understanding of the underlying distribution.
Label properties and movements of data points.
Although scatter plots can be insightful, it is not always straightforward to prevent overplotting due to the overlap of data points. This can make it challenging to correctly identify the relationship between the variables. To overcome some of the challenges, the scatter chart of D3Blocks contains various label properties with brushing and zooming __ capabilities. The label properties are _size, color, tooltip information, opacity, and stroke color (Figure 1). In addition, a gradient_ can be applied in the case of highly overlapping data points.

Another challenge is the comparisons between two or three sets of variables. This generally leads to the creation of multiple scatter plots and then visualizing them side-by-side. Although the global structure can be observed in such a manner, it isn’t easy to describe the local differences. To overcome this challenge, the scatter chart in D3Blocks can let the data points move between different coordinates. A use case is shown in this blog but for demonstration, I will create a small example:
First, install the D3Blocks library:
pip install d3blocks
In the underneath code section, we will create 3 data points, and let them move between three sets of coordinates. In addition, we will add some label properties to the data points such as color, size, and opacity. The coordinates for the first data point are: (x, y) = (1, 1), (x1, y1) = (1, 10), and (x2, y2) = (5, 5). See the code section below for the other coordinates.
from d3blocks import D3Blocks
# Initialize
d3 = D3Blocks()
# Import example
x=[1, 1, 1]
y=[1, 2, 3]
x1=[1, 1, 1]
y1=[10, 9, 5]
x2=[5, 6, 7]
y2=[5, 5, 5]
Specify the label properties for each of the three data points:
size = [15, 20, 25]
color = ['#FF0000', '#0000FF', '#00FF00']
stroke = ['#FFFFFF', '#FFFFFF', '#FFFFFF']
opacity = [0.7, 0.8, 0.8]
tooltip = ['1st datapoint', '2nd datapoint', '3th datapoint']
Now it is just a matter of providing the input parameter to the scatter function:
# Set all propreties
d3.scatter(x, # x-coordinates
y, # y-coordinates
x1=x1, # x1-coordinates
y1=y1, # y1-coordinates
x2=x2, # x2-coordinates
y2=y2, # y2-coordinates
size=size, # Size
color=color, # Hex-colors
stroke=stroke, # Edge color
opacity=opacity, # Opacity
tooltip=tooltip, # Tooltip
scale=False, # Scale the datapoints
label_radio=['(x, y)', '(x1, y1)', '(x2, y2)'],
figsize=[1024, 768],
filepath='scatter_demo.html',
)
The final scatter chart is shown in Figure 2:

Build your own interactive scatter chart.
Let’s load the MNIST data set [4] which is a well-known handwritten digits dataset that is free of use and great to examine and showcase the scatter functionalities. A mapping of the original data set towards a low dimensional space is readily performed using the clustimage library. Here we will load the pre-computed Principal Components and t-SNE coordinates which represent the digit-to-digit similarity (see code section below). With the scatter chart we can now easily visualize and compare the relationships between the two mappings by letting the data points move between the two mappings. The scale option is set to true to make sure that the two sets of coordinates are within the same range. In addition, the label properties such as tooltip, size, opacity, and (stroke) color are also set. The input for color can be manually specified with hex colors but can also be string labels. In the example, I used the digit labels which are automatically transformed into hex colors using the input colormap (cmap). Zooming and brushing capabilities are always available as depicted in Figure 3. Some interactive examples are shown over here.
# Load libraries
from d3blocks import D3Blocks
import numpy as np
# Initialize
d3 = D3Blocks()
# Load PC and tSNE coordinates
df = d3.import_example('mnist')
# Set random sizes, and opacity
size=np.random.randint(0, 8, df.shape[0])
opacity=np.random.randint(0, 8, df.shape[0])/10
# Tooltip are the digit labels
tooltip = df['y'].values.astype(str)
# Set all propreties
d3.scatter(df['PC1'].values, # PC1 x-coordinates
df['PC2'].values, # PC2 y-coordinates
x1=df['tsne_1'].values, # tSNE x-coordinates
y1=df['tsne_2'].values, # tSNE y-coordinates
color=df['y'].values.astype(str), # Hex-colors or classlabels
tooltip=tooltip, # Tooltip
size=size, # Node size
opacity=opacity, # Opacity
stroke='#000000', # Edge color
cmap='tab20', # Colormap
scale=True, # Scale the datapoints
label_radio=['PCA', 'tSNE'],
figsize=[1024, 768],
filepath='scatter_demo.html',
)

In case you want to make some changes in any of the label properties, you can change the values in the data frame as shown in code section 2. After the edits, the scatter chart can be visualized again with the show() function.
# Make dits in the dataframe
d3.edge_properties
# label x y x1 .. size stroke opacity tooltip
# 0 0 0.472107 0.871347 0.294228 .. 0 #000000 0.1 0
# 1 1 0.624696 0.116735 0.497958 .. 0 #000000 0.5 1
# 2 2 0.608419 0.305549 0.428529 .. 4 #000000 0.6 2
# 3 3 0.226929 0.532931 0.555316 .. 4 #000000 0.0 3
# 4 4 0.866292 0.553489 0.589746 .. 1 #000000 0.6 4
# ... ... ... ... ... .. ... ... ...
# 1792 9 0.262069 0.709428 0.693593 .. 5 #000000 0.5 9
# 1793 0 0.595571 0.837987 0.352114 .. 6 #000000 0.5 0
# 1794 8 0.668742 0.359209 0.520301 .. 6 #000000 0.4 8
# 1795 9 0.416983 0.694063 0.683949 .. 6 #000000 0.4 9
# 1796 8 0.489814 0.588109 0.529971 .. 1 #000000 0.4 8
# [1797 rows x 12 columns]
# Show the updated chart
d3.show(filepath='scatter_demo.html', label_radio=['PCA', 'tSNE'])
Final Words
I demonstrated how the create your own interactive and stand-alone Scatterplot using Python. Scatterplots are extremely useful for visualizing numerical variables and become even more insightful when it is interactive. The Scatter chart is one of the blocks in D3Blocks for which the use of D3js shows its strength, and advantages, such as speed, and flexibility. If you like this, there are more interactive D3js blocks that are easy to use, such as D3graph [4], the Sankey chart [5], movingbubbles [6], and many more. Feel free to play around with the library!
Be safe. Stay frosty.
Cheers, E.
If you found this article helpful, use my referral link to continue learning without limits and sign up for a Medium membership. Plus, follow me to stay up-to-date with my latest content!
Software
Let’s connect!
References
- D3Blocks: The Python Library to Create Interactive and Standalone D3js Charts. Medium, September 2022
- Quantitative comparisons between t-SNE, UMAP, PCA, and Other Mappings. Medium, May 2022
- Creating beautiful stand-alone interactive D3 charts with Python, 2022
- E, Hands-on Guide to Create beautiful Sankey Charts in d3js with Python. Medium, October 2022
- How to Create Storytelling Moving Bubbles Charts in d3js with Python, September 2022
- MNIST dataset, https://keras.io/api/datasets/mnist/ (CC BY-SA 3.0)