Data visualization is an integral part of Data Science. We create data visualizations to get a structured overview of the data at hand. They also serve as an efficient tool for delivering results.
Since it is such an important topic, there is a variety of software tools and packages for creating data visualizations.
One of them is Altair which is a declarative statistical visualization library for Python.
In this article, we will learn how to create interactive scatter plots with Altair. Interactive plots are able to provide more informative power than standard plots. They also allow for adding flexibility to a visualization.
Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
The first and foremost requirement is, of course, a dataset. We will be using the Airbnb listings dataset from Barcelona, Spain on 07 July 2021. It is shared with a creative commons license so feel free to use and explore them.
Let’s start by importing the libraries and creating a Pandas data frame.
import pandas as pd
import altair as alt
col_list = ["accommodates", "instant_bookable", "room_type", "reviews_per_month", "price"]
df = pd.read_csv(
"listings_2.csv",
usecols = col_list,
nrows = 1000
)
df.dropna(inplace=True)
print(df.shape)
(965, 5)
df.head()

There are 17k rows and 74 columns in the original dataset. We have created a data frame that only contains 5 columns and 1000 rows by using the usecols
and nrows
parameters of the read_csv
function.
It is important to note that Altair can be used with a maximum of 5000 observations (i.e. rows) with the default settings. This limitation can be disabled using the following command. Thanks to Natalia Goloskokova for the heads up!
alt.data_transformers.disable_max_rows()
We have also used the dropna
function to drop the rows that have a missing value.
There is one more data processing operation before we start creating scatter plots. The price column is stored with object data type but it needs to be converted to a numerical format.
We need to remove the "$" sign and "," used as the thousands separator. Then, we can convert it to a numerical data type.
df.loc[:, "price"] =
df.loc[:, "price"].str[1:].str.replace(",","").astype("float")
df = df[df["price"] < 1000]
df.head()

Scatter plot
Scatter plots are usually used for visualizing the relationship between two continuous variables. They provide an overview of the correlation between the variables.
We can create a scatter plot to check the relationship between the accommodates and price columns. The accommodate column is an indication of the capacity of a place so we expect it to have a positive correlation with price.
alt.Chart(df).mark_circle(size=50).encode(
alt.X("accommodates"),
alt.Y("price"),
alt.Color("room_type",
legend=alt.Legend(
title="Room Type",
orient='left',
titleFontSize=15,
labelFontSize=13)
)
).properties(
height=350, width=500
).configure_axis(
titleFontSize=20,
labelFontSize=15
)
The first step is to pass the data frame to the top-level Chart object and then we specify the type of visualization. The mark_circle
function creates a scatter plot.
In the encode
function, we write the column names to be plotted on the x and y-axis. The color
parameter is used for distinguishing different categories which are shown on the legend. It is similar to the hue
parameter in Seaborn.
Finally, the properties
and configure_axis
functions adjust the visual properties such as figure and label sizes.
This code snippet creates the following scatter plot:

This is a standard scatter plot. We will now see how to make it interactive in a few different ways.
The interactive function
The interactive function is the simplest way of making a plot interactive. It allows for zoom in and out on the plot.
Let’s enhance the plot in the previous example by adding the size attribute. We will use the accommodates and reviews per month columns on the x-axis and y-axis, respectively.
The color will indicate the room type and the size of each point will be proportional to price.
By adding the interactive
function at the end, we will be able to zoom in and out.
alt.Chart(df.sample(100)).mark_circle(size=50).encode(
alt.X("accommodates"),
alt.Y("reviews_per_month"),
alt.Color("room_type",
legend=alt.Legend(
title="Room Type",
orient='left',
titleFontSize=15,
labelFontSize=13)
),
alt.Size("price",
legend=alt.Legend(
title="Price",
orient='left',
titleFontSize=15,
labelFontSize=13))
).properties(
height=350, width=500
).configure_axis(
titleFontSize=20,
labelFontSize=15
).interactive()
I took a sample with 100 observations (i.e. rows) from our data frame to make the plot look more appealing. Here is our first interactive plot.

Interactive legend
The interactivity can be used for making more informative and functional plots as well. For instance, we can use the legend as a filter by making it interactive.
We can do so by creating a selection object and binding it to the legend.
selection = alt.selection_multi(fields=['room_type'], bind='legend')
alt.Chart(df).mark_circle(size=50).encode(
alt.X("accommodates"),
alt.Y("price"),
alt.Color("room_type",
legend=alt.Legend(
title="Room Type",
orient='left',
titleFontSize=15,
labelFontSize=13)
),
opacity=alt.condition(selection, alt.value(1), alt.value(0))
).properties(
height=350, width=500
).configure_axis(
titleFontSize=20,
labelFontSize=15
).add_selection(
selection
)
There are 3 changes compared to the previous scatter plot.
- A selection object on the
room_type
column. It is bound to the legend. - Opacity parameter that changes the opacity of points according to the selected room types.
- The
add_selection
function that is used for adding the selection object to our plot.
Here is the result:

Interactive legend with multiple plots
Altair allows for connecting a legend to multiple subplots. Thus, we can see the effects of our selection on different relationships simultaneously.
selection = alt.selection_multi(fields=['room_type'], bind='legend')
chart = alt.Chart(df).mark_circle(size=50).encode(
y='price',
color='room_type',
opacity=alt.condition(selection, alt.value(1), alt.value(0))
).properties(
height=200, width=300
).add_selection(
selection
)
chart.encode(x='reviews_per_month:Q') | chart.encode(x='accommodates:Q')
We first create a chart object without specifying the column for the x-axis. Creating and binding the selection part is the same. In the last two lines, we add two different x-axes and combine them with the "or" (|) operator.
I have removed the part used for formatting legend and axis titles to make the code look easier to understand.

Conclusion
We have covered how interactivity can be used for enhancing data visualizations. There are many more features of Altair in terms of interactivity components. Once you understand the concepts of interactivity such as selection, binding, and condition, you can create stunning data visualizations.
Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you do so using the following link, I will receive a portion of your membership fee at no additional cost to you.
Thank you for reading. Please let me know if you have any feedback.