
Many beginner courses dwell on Matplotlib for visualization, and the reason is the underlying functionalities and ability to customize every plot detail. But, I found myself bogged down by all the documentation, community discussions, and many ways of creating simple plots, and thank goodness I found Seaborn.
Seaborn is an interface built on top of Matplotlib that uses short lines of code to create and style statistical plots from Pandas datafames. It utilizes Matplotlib under the hood, and it is best to have a basic understanding of the figure, axes, and axis objects.
8 Seaborn Plots for Univariate Exploratory Data Analysis (EDA) in Python
We will use the vehicles dataset from Kaggle that is under the Open database license. The code below imports the required libraries, sets the style, and loads the dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
sns.set(font_scale=1.3)
cars = pd.read_csv('edited_cars.csv')
Before we continue, note that seaborn plots belong to one of two groups.
- Axes-level plots – These mimic Matplotlib plots and can be bundled into subplots using the
ax
parameter. They return an axes object and use normal Matplotlib functions to style. - Figure-level plots – These provide a wrapper around axes plots and can only create meaningful and related subplots because they control the entire figure. They return either FacetGrid, PairGrid, or JointGrid objects and do not support the
ax
parameter. They use different styling and customization inputs.
For each plot, I will mention which group it falls in.
Part one: Exploring relationships between numeric columns
Numeric features contain continuous data or numbers as values.
The first two plots will be matrix plots, where you pass the whole dataframe to visualize all the pairwise distributions in one plot.
1. Pair plot
A pair plot creates a grid of scatter plots to compare the distribution of pairs of numeric variables. It also features a histogram for each feature in the diagonal boxes.
Functions to use:
sns.pairplot()
– figure-level plot
The kind
parameter changes the type of bivariate plots created with kind= 'scatter' (default)
, 'kde', 'hist'
or 'reg'
.
Two columns per grid (Bivariate)
sns.pairplot(cars);
What to look out for:
- Scatter plots showing either positive linear relationships (if x increases, y increases) or negative (if x increases, y decreases).
- Histograms in the diagonal boxes that show the distribution of individual features.
In the pair plot below, the circled plots show an apparent linear relationship. The diagonal line points out the histograms for each feature, and the pair plot’s top triangle is a mirror image of the bottom.

Three columns (multivariate): two numeric and one categorical
We can add a third variable that segments the scatter plots by color using the parameter hue='cat_col'
.
sns.pairplot(
data=cars,
aspect=.85,
hue='transmission');

What to look out for:
- Clusters of different colors in the scatter plots.
2. Heat map
A heat map is a color-coded graphical representation of values in a grid. It’s an ideal plot to follow a pair plot because the plotted values represent the correlation coefficients of the pairs that show the measure of the linear relationships.
In short, a pair plot shows the intuitive trends of the data, while a heat map plots the actual correlation values using color.
Functions to use:
sns.heatmap()
-axes-level plot
First, we run df.corr()
to get a table with the correlation coefficients. This table is also known as a correlation matrix.
cars.corr()

sns.heatmap()
– Since the table above is not very intuitive, we’ll create a heatmap.
sns.set(font_scale=1.15)
plt.figure(figsize=(8,4))
sns.heatmap(
cars.corr(),
cmap='RdBu_r',
annot=True,
vmin=-1, vmax=1);
cmap='RdBu_r'
sets the color scheme, annot=True
draws the values inside the cells, and vmin
and vmax
ensures the color codes start at -1 to 1.

What to look out for:
- Highly correlated features. These are the dark-red and dark-blue cells. Values close to 1 mean a high positive linear relationship, while close to -1 show a high negative relationship.

In the following plots, we will further explore these relationships.
3. Scatter plot
A scatter plot shows the relationship between two numeric features by using dots to visualize how these variables move together.
Functions to use:
sns.scatterplot()
– axes-level plotsns.relplot(kind='line')
– figure-level
Functions with regression line;
sns.regplot()
– axes-levelsns.lmplot()
– figure-level
Two numeric columns (bivariate)
[sns.scatterplot(x='num_col1', y='num_col2', data=df)](https://seaborn.pydata.org/generated/seaborn.scatterplot.html)
– **** Let us visualize the engine size
with the mileage
(efficiency) of the vehicle.
sns.set(font_scale=1.3)
sns.scatterplot(
x='engine_cc',
y='mileage_kmpl',
data=cars)
plt.xlabel(
'Engine size in CC')
plt.ylabel(
'Fuel efficiency')

sns.regplot(x, y, data)
A reg plot draws a scatter plot with a regression line showing the trend of the data.
sns.regplot(
x='engine_cc',
y='mileage_kmpl',
data=cars)
plt.xlabel(
'Engine size in CC')
plt.ylabel(
'Fuel efficiency');

Three columns (multivariate): two numeric and one categorical.
sns.scatterplot(x, y, data, hue='cat_col')
– We can further segment the scatter plot by a categorical variable using hue
.
sns.scatterplot(
x='mileage_kmpl',
y='engine_cc',
data=cars,
palette='bright',
hue='fuel');

sns.relplot(x, y, data, kind='scatter', hue='cat_col')
A rel plot, or relational plot, is used to create a scatter plot using kind='scatter'
(default), or a line plot using kind='line'.
In our plot below, we use kind='scatter'
and hue='cat_col'
to segment by color. Note how the image below has similar results to the one above.
sns.relplot(
x='mileage_kmpl',
y='engine_cc',
data=cars,
palette='bright',
kind='scatter',
hue='fuel');

sns.relplot(x, y, data, kind='scatter', col='cat_col')
– We can also create subplots of the segments column-wise using col='cat_col'
and/or row-wise using row='cat_col'
. The plot below splits the data by the transmission
categories into different plots.
sns.relplot(
x='year',
y='selling_price',
data=cars,
kind='scatter',
col='transmission');

Four columns: two numeric and two categorical.
sns.relplot(x,y,data, hue='cat_col1', col='cat_col2') -
the col_wrap
parameter wraps columns after this width so that the subplots span multiple rows.
sns.relplot(
x='year',
y='selling_price',
data=cars,
palette='bright',
height=3, aspect=1.3,
kind='scatter',
hue='transmission',
col='fuel',
col_wrap=2);

sns.lmplot(x, y, data, col='cat_col1', hue='cat_col2')
The lmplot is the figure-level version of a regplot that draws a scatter plot with a regression line onto a Facet grid. It does not have a kind
parameter.
sns.lmplot(
x="seats",
y="engine_cc",
data=cars,
palette='bright',
col="transmission",
hue="fuel");

4. line plot
A line plot comprises dots connected by a line that shows the relationship between the x and y variables. The x-axis usually contains time intervals, while the y-axis holds a numeric variable whose changes we want to track over time.
Functions to use:
sns.lineplot()
– axes-level plotsns.relplot(kind='line')
– figure-level plot
Two columns (bivariate): numeric and time series.
sns.lineplot(x='time', y='num_col', data=df)
sns.lineplot(
x="year",
y="selling_price",
data=cars)

Three columns (multivariate): time series, numeric, and categorical column.
sns.lineplot(x, y, data, hue='cat_col')
-We split can split the lines by a categorical variable using hue.
sns.lineplot(
x="year",
y="selling_price",
data=cars,
palette='bright',
hue='fuel');

The results above can be obtained using sns.relplot
with kind='line'
and the hue
parameter.
sns.relplot(x, y, data, kind='line', col='cat_col')
– As mentioned earlier, a rel plot’s kind='line'
parameter plots a line graph. We will use col='transmission'
to create column-wise subplots for the two transmission classes.
sns.relplot(
x="year",
y="selling_price",
data=cars,
color='blue', height=4
kind='line',
col='transmission');

Four columns: time series, numeric, and two categorical columns.
sns.relplot(x, y, data, kind='line', col='cat_col1', hue='cat_col2')
sns.relplot(
x="year",
y="selling_price",
data=cars,
palette='bright',
height=4,
kind='line',
col='transmission',
hue="fuel");

5. Joint plot
A joint plot comprises three charts in one. The center contains the bivariate relationship between the x and y variables. The top and right-side plots show the univariate distribution of the x-axis and y-axis variables, respectively.
Functions to use:
sns.jointplot()
– figure-level plot
Two columns (bivariate): two numeric
sns.[jointplot](https://seaborn.pydata.org/generated/seaborn.jointplot.html)(x='num_col1, y='num_col2, data=df)
– By default, the center plot is a scatter plot, (kind='scatter')
while the side plots are histograms.
sns.jointplot(
x='max_power_bhp',
y='selling_price',
data=cars);

The joint plots in the image below utilize different kind
parameters ('kde'
, 'hist'
, 'hex'
, or 'reg')
as annotated in each figure.

Three columns (multivariate): two numeric, one categorical
sns.jointplot(x, y, data, hue='cat_col')
sns.jointplot(
x='selling_price',
y='max_power_bhp',
data=cars,
palette='bright',
hue='transmission');

Part two: Exploring the relationships between categorical and numeric relationships
In the following charts, the x-axis will hold a categorical variable and the y-axis a numeric variable.
6. Bar plot
The bar chart uses bars of different heights to compare the distribution of a numeric variable between groups of a categorical variable.
By default, bar heights are estimated using the "mean". The estimator
parameter changes this aggregation function by using python’s inbuilt functions such as estimator=max
or len
, or NumPy functions like np.max
and np.median
.
Functions to use:
sns.barplot()
– axes-level plotsns.catplot(kind='bar')
– figure-level plot
Two columns (bivariate): numeric and categorical
sns.barplot(x='cat_col', y='num_col', data=df)
sns.barplot(
x='fuel',
y='selling_price',
data=cars,
color='blue',
# estimator=sum,
# estimator=np.median);

Three columns (multivariate): two categorical and one numeric.
sns.barplot(x, y, data, hue='cat_col2')
sns.barplot(
x='fuel',
y='selling_price',
data=cars,
palette='bright'
hue='transmission');

sns.catplot(x, y, data, kind=’bar’, hue=’cat_col’)
A catplot or categorical plot, uses the kind
parameter to specify what categorical plot to draw with options being 'strip'
(default), 'swarm', 'box', 'violin', 'boxen', 'point'
and 'bar'
.
The plot below uses catplot to create a similar plot to the one above.
sns.catplot(
x='fuel',
y='selling_price',
data=cars,
palette='bright',
kind='bar',
hue='transmission');

Four columns: three categorical and one numeric
`sns.catplot(x, y, data, kind=’bar’, hue=’cat_col2′, col=’cat_col3′) -` Use the col_wrap
parameter to wrap columns after this width so that the subplots span multiple rows.
g = sns.catplot(
x='fuel',
y='selling_price',
data=cars,
palette='bright',
height=3, aspect=1.3,
kind='bar',
hue='transmission',
col ='seller_type',
col_wrap=2)
g.set_titles(
'Seller: {col_name}');

7. Point plot
Instead of bars like in a bar plot, a point plot draws dots to represent the mean (or another estimate) of each category group. A line then joins the dots, making it easy to compare how the y variable’s central tendency changes for the groups.
Functions to use:
sns.pointplot()
– axes-level plotsns.catplot(kind='point')
– figure-level plot
Two columns(bivariate): one categorical and one numeric
sns.pointplot(x='cat_col', y='num_col', data=df)
sns.pointplot(
x='seller_type',
y='mileage_kmpl',
data=cars);

Three columns (multivariate): two categorical and one numeric
When you add a third category using hue
, a point plot is more informative than a bar plot because a line is drawn through each "hue" class, making it easy to compare how that class changes across the x variable’s groups.
sns.catplot(x, y, data, kind='point', col='cat_col2')
– Here, catplot is used with kind='point'
and hue='cat_col'
. The same results can be obtained using sns.pointplot
and the hue
parameter.
sns.catplot(
x='transmission',
y='selling_price',
data=cars,
palette='bright',
kind='point',
hue='seller_type');

sns.catplot(x, y, data, kind='point', col='cat_col2', hue='cat_col')
– Here, we use the same categorical feature in the hue
and col
parameters.
sns.catplot(
x='fuel',
y='year',
data=cars,
ci=None,
height=5, #default
aspect=.8,
kind='point',
hue='owner',
col='owner',
col_wrap=3);

8. Box plot
A box plot visualizes the distribution between numeric and categorical variables by displaying the information about the quartiles.

From the plots, you can see the minimum value, median, maximum value, and outliers for every category class.
Functions to use:
sns.boxplot()
– axes-level plotsns.catplot(kind='box')
– figure-level plot
Two columns (bivariate): one categorical and one numeric
sns.boxplot(x='cat_col', y='num_col', data=df)
sns.boxplot(
x='owner',
y='engine_cc',
data=cars,
color='blue')
plt.xticks(rotation=45,
ha='right');

Three columns (multivariate): two categorical and one numeric
sns.boxplot(x, y, data, hue='cat_col2')
– These results can also be recreated using sns.catplot
using kind='box'
and hue
.
sns.boxplot(
x='fuel',
y='max_power_bhp',
data=cars,
palette='bright',
hue='transmission');

sns.catplot(x, y, data, kind='box', col='cat_col2'
) – Use the catplot
function with kind='box'
and provide col
parameter to create subplots.
sns.catplot(
x='fuel',
y='max_power_bhp',
data=cars,
palette='bright',
kind = 'box',
col='transmission');

Four columns: three categorical and one numeric
sns.catplot(x, y, data, kind='box', hue='cat_col2', col='cat_col3')
g = sns.catplot(
x='owner',
y='year',
data=cars,
palette='bright',
height=3, aspect=1.5,
kind='box',
hue='transmission',
col='fuel',
col_wrap=2)
g.set_titles(
'Fuel: {col_name}');
g.set_xticklabels(
rotation=45, ha='right')

9. Violin plot
In addition to the quartiles displayed by a box plot, a violin plot draws a Kernel density estimate curve that shows probabilities of observations at different areas.

Functions to use:
sns.violinplot()
– axes-level plotsns.catplot(kind='violin')
– figure-level plot
Two columns (bivariate): numeric and categorical.
sns.violinplot
(x='cat_col', y='num_col', data=df
)
sns.violinplot(
x='transmission',
y='engine_cc',
data=cars,
color='blue');

Three columns (multivariate) – Two categorical and one numeric.
sns.catplot(x, y, data, kind='violin', hue='cat_col2')
– Use the catplot function with the kind='violin'
and hue='cat_col'
. The same results below can be replicated using sns.violinplot
with the hue
parameter.
g = sns.catplot(
x='owner',
y='year',
data=cars,
palette='bright',
height=3,
aspect=2
split=False,
# split=True
kind='violin',
hue='transmission')
g.set_xticklabels(
rotation=45,
ha='right')
The violin plot supports the split
parameter, which draws half of the violin plot for each categorical class. Note that this works when the hue
variable has only two classes.

Four columns: three categorical and one numeric
sns.catplot(x, y, data, kind='violin', hue='cat_col2', col='cat_col3')
– Here, we filter the data for only 'diesel'
and 'petrol'
fuel types.
my_df = cars[cars['fuel'].isin(['Diesel','Petrol'])]
g = sns.catplot(
x="owner",
y="engine_cc",
data=my_df,
palette='bright',
kind = 'violin',
hue="transmission",
col = 'fuel')
g.set_xticklabels(
rotation=90);

10. Strip plot
A strip plot uses dots to show how a numeric variable is distributed among classes of a categorical variable. Think of it as a scatter plot where one axis is a categorical feature.
Functions to use:
sns.stripplot()
– axes-level plotsns.catplot(kind='strip')
– figure-level plot
Two variables (bivariate): one categorical and one numeric
sns.stripplot(x='cat_col', y='num_col', data=df)
plt.figure(
figsize=(12, 6))
sns.stripplot(
x='year',
y='km_driven',
data=cars,
linewidth=.5,
color='blue')
plt.xticks(rotation=90);

Three columns (multivariate): two categorical and one numeric
sns.catplot(x, y, data, kind='strip', hue='cat_col2')
– Use the catplot function using kind='strip'
(default) and provide the hue
parameter. The argument dodge=True
(default is dodge=False
) can be used to separate the vertical dots by color.
sns.catplot(
x='seats',
y='km_driven',
data=cars,
palette='bright',
height=3,
aspect=2.5,
# dodge=True,
kind='strip',
hue='transmission');

Four columns: three categorical and one numeric
sns.catplot(x, y, data, kind='strip', hue='cat_col2', col='cat_col3')
g = sns.catplot(
x="seller_type",
y="year",
data=cars,
palette='bright',
height=3, aspect=1.6,
kind='strip',
hue='owner',
col='fuel',
col_wrap=2)
g.set_xticklabels(
rotation=45,
ha='right');

Combining strip plot with violin plot
A strip plot can be used together with a violin plot or box plot to show the position of gaps or outliers in the data.
g = sns.catplot(
x='seats',
y='mileage_kmpl',
data=cars,
palette='bright',
aspect=2,
inner=None,
kind='violin')
sns.stripplot(
x='seats',
y='mileage_kmpl',
data=cars,
color='k',
linewidth=0.2,
edgecolor='white',
ax=g.ax);

Additional remarks
- For categorical plots such as bar plots and box plots, the bar direction can be re-oriented to horizontal bars by switching up the x and y variables.
- The
row
andcol
parameters of the FacetGrid figure-level objects used together can add another dimension to the subplots. However, col_wrap cannot be with the row parameter. - The FacetGrid supports different parameters depending on the underlying plot. For example,
sns.catplot(kind='violin')
will support thesplit
parameter while other kinds will not. More on the kind-specific options in this documentation. - Figure-level functions also create bivariate plots. For example,
sns.catplot(x='fuel', y='mileage_cc', data=cars, kind='bar')
creates a basic bar plot.
Conclusion
In this article, we performed bivariate and multivariate analyses on a dataset.
We first created matrix plots that visualized relationships in a grid to identify numeric variables with high correlations. We then used different axes-level and figure-level functions to create charts that explored the relationships between the numeric and categorical columns. Find the code here on GitHub.
I hope you enjoyed the article. To receive more like this whenever I publish, subscribe here. If you are not yet a medium member and would like to support me as a writer, follow this link and I will earn a small commission. Thank you for reading!