Data visualization is an integral part of Data Science. It is quite useful in exploring and understanding the data. In some cases, visualizations are much better than plain numbers at conveying information as well.
The relationships among variables, the distribution of variables, and underlying structure in data can easily be discovered using data visualization techniques.
In this article, we will go over 5 fundamental Data Visualization types that are commonly used in data analysis. We will be using the Altair library which is a statistical visualization library for Python.
I previously wrote similar articles with Seaborn and ggplot2 if you prefer one of those libraries for data visualization tasks. I suggest to go over all because comparing different tools and frameworks on the same task will help you learn better.
Let’s first create a sample dataframe to be used for the examples.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'date':pd.date_range(start='2020-01-10', periods=100, freq='D'),
'cat':pd.Series(['A','B','C']).sample(n=100, replace=True),
'val':(np.random.randn(100) + 10).round(2),
'val2':(np.random.random(100) * 10).round(2),
'val3':np.random.randint(20,50, size=100)
})
df = df.reset_index(drop=True)
df.head()

The dataframe consists of 100 rows and 5 columns. It contains datetime, categorical, and numerical values.
1. Line plot
Line plots visualize the relation between two variables. One of them is usually the time. Thus, we can see how a variable changes over time. Stock prices, daily temperatures are some use cases where a line plots come in handy.
Here is how a simple line plot is created with Altair.
import altair as alt
alt.Chart(df).mark_line().encode(
x='date', y='val'
)

Let’s elaborate on the syntax. We start by passing the data to a top-level Chart object. The next function specifies the type of plot. The encode function specifies which columns are used in the plot. Thus, anything we write in the encode function must be linked to the dataframe.
There are more functions and parameters that Altair provides to generate more informative or customized plots. We will see them in the following examples.
In order to make the above line plot looks better, we can adjust the value range for the y-axis using the scale property.
alt.Chart(df).mark_line().encode(
alt.X('date'),
alt.Y('val', scale=alt.Scale(zero=False))
)

In order to use the scale property, we specify the column names with X and Y encodings (e.g. alt.X). The zero parameter is set as "False" to prevent the axis from starting at zero.
2. Scatter plot
Scatter plot is also a relational plot. It is commonly used to visualize the values of two numerical variables. We can observe if there is a correlation between them.
We can create a scatter plot of the "val" and "val2" columns as below.
alt.Chart(df).mark_circle(size=40).encode(
alt.X('val', scale=alt.Scale(zero=False)),
alt.Y('val2'),
alt.Color('cat')
)

We have used the color encoding to separate data points based on the "cat" column. The size parameter of the mark_circle function is used to adjust the size of the points in the scatter plot.
3. Histogram
Histogram is used to visualize the distribution of a continuous variable. It divides the value range into discrete bins and counts the number of data points in each bin.
Let’s create a histogram of the "val3" column. The mark_bar function is used but we specify the x and y encodings in a way that the function generates a histogram.
alt.Chart(df).mark_bar().encode(
alt.X('val3', bin=True),
alt.Y('count()')
).properties(title='Histogram of val3', height=300, width=450)

We have also used the properties function to customize the size and add a title.
4. Box plot
Box plot provides an overview of the distribution of a variable. It shows how values are spread out by means of quartiles and outliers.
We can create a box plot using the mark_boxplot function of Altair as follows.
alt.Chart(df).mark_boxplot().encode(
alt.X('cat'),
alt.Y('val2', scale=alt.Scale(zero=False))
).properties(height=200, width=400)

The range of values in A is smaller than the other two categories. The white line inside the boxes indicates the median value.
5. Bar plot
Bar plot can be used to visualize a categorical variable. Each category is represented with a bar whose size is proportional to the value for that category.
For instance, we can use a bar plot to visualize the weekly total of "val3" column. Let’s first calculate the weekly totals with Pandas library.
df['week'] = df['date'].dt.isocalendar().week
weekly = df[['week','val3']].groupby('week', as_index=False).sum()
weekly.head()

The first line extracts the week number from the date column. The second line groups the "val3" column by week and calculates the sum.
We can now create the bar plot.
alt.Chart(weekly).mark_bar().encode(
x='val3:Q', y='week:O'
)

Conclusion
We have covered 5 basic yet very functional visualization types. They all are fundamental to explore a dataset and unveil the relationships between variables.
It is possible to create more complex, informative, and customized visualizations with Altair. It is also highly efficient and powerful in terms of data transformations and filtering.
If you’d like to learn and practice Altair in more detail, here is a series of 4 articles that I wrote previously.
- Part 1: Introduction to Altair
- Part 2: Filtering and transforming data
- Part 3: Interactive plots and dynamic filtering
- Part 4: Customizing the visualizations
Thank you for reading. Please let me know if you have any feedback.