The world’s leading publication for data science, AI, and ML professionals.

Clash of Python Data Visualization Libraries

Seaborn, Altair, and Plotly

Photo by Santiago Lacarta on Unsplash
Photo by Santiago Lacarta on Unsplash

Data visualization is a fundamental ingredient of Data Science. It helps us understand the data better by providing insights. We also use data visualization to deliver the results or findings.

Python, being the predominant choice of programming language in the data science ecosystem, offers a rich selection of data visualization libraries. In this article, we will do a practical comparison of 3 popular ones.

The libraries we will cover are Seaborn, Altair, and Plotly. The examples will consist of 3 fundamental Data Visualization types which are scatter plot, histogram, and line plot.

We will do the comparison by creating the same visualizations with all 3 libraries. We will be using the Melbourne housing dataset available on Kaggle for the examples.

Let’s start with importing the libraries and reading the dataset.

# import the libraries
import numpy as np
import pandas as pd
import seaborn as sns
import altair as alt
import plotly.express as px
# read the dataset
df = pd.read_csv("/content/melb_data.csv", parse_dates=['Date'])
df.shape
(13580, 21)
# remove outliers
df = df[(df.Price < 500000) &amp; (df.Landsize < 5000)]

The dataset contains 21 features on 13580 houses in Melbourne. I have also removed the outliers not to distort the appearance of the plots.


Scatter plot

A scatter plot is a relational plot. It is commonly used to visualize the values of two numerical variables. We can observe if there is a correlation between them.

Let’s generate a scatter plot of the price and land size columns to investigate the relationship between them.

# Seaborn
sns.relplot(data=df, x="Price", y="Landsize", kind="scatter",
            hue="Type", height = 5, aspect = 1.4)
(image by author)
(image by author)

After passing the data frame to the data parameter, the columns to be plotted are determined with the x and y parameters. The hue parameter adds one more piece of information to the plot. We get an overview of how the price and land size change based on different categories in the type column. Finally, the height and aspect parameters adjust the size of the plot.

# Altair
alt.Chart(df).mark_circle(size=40).encode(
   x='Price', y='Landsize', color="Type"
).properties(height=300, width=500)
(image by author)
(image by author)

Altair syntax starts with a top-level chart object that accepts the data frame. The next step is to choose the type of the plot. We specify the columns to be plotted in the encode function.

The color parameter is the same as the hue parameter of Seaborn. Finally, we adjust the size of the plot using the properties function.

# plotly express
fig = px.scatter(df, x='Price', y='Landsize', color='Type',
                 height=450, width=700)
fig.show()
(image by author)
(image by author)

We have used plotly express which is the high-level API of the plotly.py library. The syntax is quite similar to Seaborn syntax. The color parameter is used instead of the hue parameter. The height and width parameters adjust the size.


2. Histogram

Histograms are usually used to visualize the distribution of a continuous variable. The range of values of a continuous variable are divided into discrete bins and the number of data points (or values) in each bin is represented with bars.

We can create a histogram of the price column to check the price distribution of the houses.

# Seaborn
sns.displot(data=df, x="Price", kind="hist",
            height=5, aspect=1.4)
(image by author)
(image by author)

We use the displot function which allows for creating different distribution plots. The kind parameter selects the plot type.

# Altair
alt.Chart(df).mark_bar().encode(
   alt.X('Price:Q', bin=True), y='count()'
).properties(height=300, width=500)
(image by author)
(image by author)

What we write in the encode function tells Altair to divide the values in the price column into bins and then count the number of data points (i.e. rows) in each bin.

The last histogram will be created with the plotly express library. The histogram function is used as follows.

# plotly express
fig = px.histogram(df, x='Price', height=450, width=600)
fig.show()
(image by author)
(image by author)

3. Line plot

Line plots visualize the relationship between two variables. One of them is usually the time so we can see how a variable changes over time.

We can generate a line plot to display the daily average of house prices. Let’s first calculate the daily averages and save them in another data frame.

avg_daily_price = df.groupby('Date', as_index=False).agg(
   avg_daily_price = ('Price', 'mean')
).round(1)
avg_daily_price.head()
(image by author)
(image by author)

We have used the Pandas groupby function to calculate the average house price in each day. We can now create a line plot of average daily price and date.

# Seaborn
sns.relplot(data=avg_daily_price, x="Date", y="avg_daily_price", kind="line", height=5, aspect=2)
(image by author)
(image by author)

Like the scatter plot, the line plot can be created using the relplot function of Seaborn.

# Altair
alt.Chart(avg_daily_price).mark_line().encode(
   alt.X("Date"),
   alt.Y("avg_daily_price", scale=alt.Scale(zero=False))
).properties(height=300, width=600)
(image by author)
(image by author)

Unlike the previous examples, we have used X and Y encodings of Altair. The reason is that Altair starts the axis values from zero by default. It does not adjust them according to the value range of columns. We need to explicitly tell Altair not to start from zero if most of the values are way above zero which is the case in our plot.

Plotly express provides the line function for generating line plots. The syntax is almost the same as the previous examples. We only change the name of the function.

# plotly express
fig = px.line(avg_daily_price, x="Date", y="avg_daily_price",
              height=400, width=700)
fig.show()
(image by author)
(image by author)

Conclusion

We have covered 3 basic plot types with 3 commonly used Python data visualization libraries.

Which one to use comes down to a decision based on the syntax, user preference, and style. However, for the most part, we can use any of them to generate beautiful and informative visualizations.

The plots in this article can be considered as the most basic ones. These three libraries are capable of creating more complex ones. They also provide many features to customize the plots.

Thank you for reading. Please let me know if you have any feedback.


Related Articles