The world’s leading publication for data science, AI, and ML professionals.

Creating Interactive Scatter Plots with Python Altair

Enhance the informative power of your data visualizations

Photo by IIONA VIRGIN on Unsplash
Photo by IIONA VIRGIN on Unsplash

Data visualization is an integral part of Data Science. We create data visualizations to get a structured overview of the data at hand. They also serve as an efficient tool for delivering results.

Since it is such an important topic, there is a variety of software tools and packages for creating data visualizations.

One of them is Altair which is a declarative statistical visualization library for Python.

In this article, we will learn how to create interactive scatter plots with Altair. Interactive plots are able to provide more informative power than standard plots. They also allow for adding flexibility to a visualization.

Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

The first and foremost requirement is, of course, a dataset. We will be using the Airbnb listings dataset from Barcelona, Spain on 07 July 2021. It is shared with a creative commons license so feel free to use and explore them.


Let’s start by importing the libraries and creating a Pandas data frame.

import pandas as pd
import altair as alt
col_list = ["accommodates", "instant_bookable", "room_type", "reviews_per_month", "price"]
df = pd.read_csv(
    "listings_2.csv",
    usecols = col_list,
    nrows = 1000
)
df.dropna(inplace=True)
print(df.shape)
(965, 5)
df.head()
(image by author)
(image by author)

There are 17k rows and 74 columns in the original dataset. We have created a data frame that only contains 5 columns and 1000 rows by using the usecols and nrows parameters of the read_csv function.


It is important to note that Altair can be used with a maximum of 5000 observations (i.e. rows) with the default settings. This limitation can be disabled using the following command. Thanks to Natalia Goloskokova for the heads up!

alt.data_transformers.disable_max_rows()

We have also used the dropna function to drop the rows that have a missing value.

There is one more data processing operation before we start creating scatter plots. The price column is stored with object data type but it needs to be converted to a numerical format.

We need to remove the "$" sign and "," used as the thousands separator. Then, we can convert it to a numerical data type.

df.loc[:, "price"] = 
df.loc[:, "price"].str[1:].str.replace(",","").astype("float")
df = df[df["price"] < 1000]
df.head()
(image by author)
(image by author)

Scatter plot

Scatter plots are usually used for visualizing the relationship between two continuous variables. They provide an overview of the correlation between the variables.

We can create a scatter plot to check the relationship between the accommodates and price columns. The accommodate column is an indication of the capacity of a place so we expect it to have a positive correlation with price.

alt.Chart(df).mark_circle(size=50).encode(
    alt.X("accommodates"),
    alt.Y("price"),
    alt.Color("room_type", 
              legend=alt.Legend(
                  title="Room Type",  
                  orient='left',
                  titleFontSize=15,
                  labelFontSize=13)
             )
).properties(
    height=350, width=500
).configure_axis(
    titleFontSize=20,
    labelFontSize=15
)

The first step is to pass the data frame to the top-level Chart object and then we specify the type of visualization. The mark_circle function creates a scatter plot.

In the encode function, we write the column names to be plotted on the x and y-axis. The color parameter is used for distinguishing different categories which are shown on the legend. It is similar to the hue parameter in Seaborn.

Finally, the properties and configure_axis functions adjust the visual properties such as figure and label sizes.

This code snippet creates the following scatter plot:

(image by author)
(image by author)

This is a standard scatter plot. We will now see how to make it interactive in a few different ways.

The interactive function

The interactive function is the simplest way of making a plot interactive. It allows for zoom in and out on the plot.

Let’s enhance the plot in the previous example by adding the size attribute. We will use the accommodates and reviews per month columns on the x-axis and y-axis, respectively.

The color will indicate the room type and the size of each point will be proportional to price.

By adding the interactive function at the end, we will be able to zoom in and out.

alt.Chart(df.sample(100)).mark_circle(size=50).encode(
    alt.X("accommodates"),
    alt.Y("reviews_per_month"),
    alt.Color("room_type", 
              legend=alt.Legend(
                  title="Room Type",  
                  orient='left',
                  titleFontSize=15,
                  labelFontSize=13)
             ),
    alt.Size("price",
            legend=alt.Legend(
                  title="Price",  
                  orient='left',
                  titleFontSize=15,
                  labelFontSize=13))
).properties(
    height=350, width=500
).configure_axis(
    titleFontSize=20,
    labelFontSize=15
).interactive()

I took a sample with 100 observations (i.e. rows) from our data frame to make the plot look more appealing. Here is our first interactive plot.

(GIF by author)
(GIF by author)

Interactive legend

The interactivity can be used for making more informative and functional plots as well. For instance, we can use the legend as a filter by making it interactive.

We can do so by creating a selection object and binding it to the legend.

selection = alt.selection_multi(fields=['room_type'], bind='legend')
alt.Chart(df).mark_circle(size=50).encode(
    alt.X("accommodates"),
    alt.Y("price"),
    alt.Color("room_type", 
              legend=alt.Legend(
                  title="Room Type",  
                  orient='left',
                  titleFontSize=15,
                  labelFontSize=13)
             ),
    opacity=alt.condition(selection, alt.value(1), alt.value(0))
).properties(
    height=350, width=500
).configure_axis(
    titleFontSize=20,
    labelFontSize=15
).add_selection(
    selection
)

There are 3 changes compared to the previous scatter plot.

  • A selection object on the room_type column. It is bound to the legend.
  • Opacity parameter that changes the opacity of points according to the selected room types.
  • The add_selection function that is used for adding the selection object to our plot.

Here is the result:

(GIF by author)
(GIF by author)

Interactive legend with multiple plots

Altair allows for connecting a legend to multiple subplots. Thus, we can see the effects of our selection on different relationships simultaneously.

selection = alt.selection_multi(fields=['room_type'], bind='legend')
chart = alt.Chart(df).mark_circle(size=50).encode(
    y='price',
    color='room_type',
    opacity=alt.condition(selection, alt.value(1), alt.value(0))
).properties(
    height=200, width=300
).add_selection(
    selection
)
chart.encode(x='reviews_per_month:Q') | chart.encode(x='accommodates:Q')

We first create a chart object without specifying the column for the x-axis. Creating and binding the selection part is the same. In the last two lines, we add two different x-axes and combine them with the "or" (|) operator.

I have removed the part used for formatting legend and axis titles to make the code look easier to understand.

(GIF by author)
(GIF by author)

Conclusion

We have covered how interactivity can be used for enhancing data visualizations. There are many more features of Altair in terms of interactivity components. Once you understand the concepts of interactivity such as selection, binding, and condition, you can create stunning data visualizations.

Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you do so using the following link, I will receive a portion of your membership fee at no additional cost to you.

Join Medium with my referral link – Soner Yıldırım


Thank you for reading. Please let me know if you have any feedback.


Related Articles