The world’s leading publication for data science, AI, and ML professionals.

Data Visualization Cheat Sheet with Seaborn and Matplotlib

A visualization cheat sheet for anyone who has the memory of a goldfish!

Photo by Soraya Irving on Unsplash
Photo by Soraya Irving on Unsplash

Introduction

Exploratory Data Analysis – EDA is an indispensable step in data mining. To interpret various aspects of a data set like its distribution, principal or interference, it is necessary to visualize our data in different graphs or images. Fortunately, Python offers a lot of libraries to make visualization more convenient and easier than ever. Some of which are widely used today such as Matplotlib, Seaborn, Plotly or Bokeh.

Since my job concentrates on scrutinizing all angles of data, I have been exposed to many types of graphs. However, because there are way too many functions and the codes are not easy to remember, I sometimes forget the syntax and have to review or search for similar codes on the Internet. Without doubt, it has wasted a lot of my time, hence my motivation for writing this article. Hopefully, it can be a small help to anyone who has a memory of a goldfish like me.

Data Description

My dataset is downloaded from public Kaggle dataset. It is a grocery dataset, and you can easily get the data from the link below:

Groceries dataset

This grocery data consists of 3 columns, which are:

  • Member_number: id numbers of customers
  • Date: date of purchasing
  • itemDescription: Item name

Now, let’s have a look at the data frame and its information:

Install necessary packages

There are some packages that we should import first.

import numpy as np
import pandas as pd
import seaborn as sns
import Matplotlib.pyplot as plt
%matplotlib inline 

Visualize data

Line Chart

For this section, I will use a line graph to visualize sales the grocery store during the time of 2 years 2014 and 2015.

First, I will transform the data frame a bit to get the items counted by month and year.

After we have our data, let’s try to visualize it:

Bar Chart

Bar chart is used to simulate the changing trend of objects over time or to compare the figures / factors of objects. Bar charts usually have two axes: one axis is the object / factor that needs to be analyzed, the other axis is the parameters of the objects.

For this dataset, I will use a bar chart to visualize 10 best categories sold in 2014 and 2015. You can either display it by horizontal or vertical bar chart. Let’s see how it looks.

Data Transformation

Horizontal Bar Chart

If you prefer vertical bar chart, try this:

Bar Chart with Hue Value

If you want to compare each category’s sales by year, what would your visualization look like? You can draw the graph with an addition of an element called hue value.

Now, can you see it more clearly?

Histogram

Imagine that I want to discover the frequency of customers buying whole milk, the best seller category. I will use histogram to obtain this information.

By looking at the visualization, we can see that customers hardly repurchase this item more than twice, and a lot of customers cease to buy this product after their first purchases.

Pie chart

Actually, pie charts are quite poor at communicating the data. However, it does not hurt to learn this visualization technique.

For this data, I want to compare the sales of top 10 categories with the rest in both year 2014 and 2015. Now, let’s transform our data to get this information visualized.

Our data is now ready. Let’s see the pies!

So, it is obvious that top 10 categories were less purchased in 2015 compared to 2014, by 5.5%.

Swarm Plot

Another way to review your data is swarm plot. In swarm plot, points are adjusted (vertical classification only) so that they do not overlap. This is helpful as it complements box plot when you want to display all observations along with some representation of the underlying distribution.

As I want to see the number of items sold in each day of the week, I may use this type of chart to display the information. As usual, let’s first calculate the items sold and group them by categories and days.

After we obtain the data, let’s see how the graph looks like.

Conclude

In this article, I have shown you how to customize your data with different types of visualizations. If you find it helpful, you can save it and review anytime you want. It can save you tons of time down the road. 😀

In order to receive updates regarding my upcoming posts, kindly subscribe as a member using the provided Medium Link.


Related Articles