A step-by-step guide to Data Visualizations in Python

Create great-looking professional visualizations in Python using Matplotlib, Seaborn, and much more packages

Nikhil Adithyan
CodeX

--

Image by Pixabay on Pexels

Data Visualization

Data Visualization is the graphical representation of Data. It involves producing efficient visual elements like charts, dashboards, graphs, mappings, etc. so as to give an accessible way of understanding trends, outliers, and patterns of data to people. The state of achieving people’s minds depends on our creativity in visualizing data and by maintaining a communicative relationship between the audience and the represented data.

Python for Visualization

Python is a highly popular general-purpose programming language and it comes extremely useful for Data Scientists to create beautiful visualizations. Python provides the Data Scientists with various packages both for data processing and visualization. In this article, we are going to use some of Python’s well-known visualization packages, Matplotlib, and Seaborn.

Steps Involved in our Visualization

  1. Importing packages
  2. Importing and Cleaning Data
  3. Creating beautiful Visualizations (12 Types of Visuals)

Step-1: Importing Packages

Not only for Data Visualization, but every process to be held in Python should also be started by importing the required packages. Our primary packages include Pandas for Data processing, Matplotlib for visuals, Seaborn for advanced visuals, and Numpy for scientific calculations. Let’s import!

Python Implementation:

In the above code, we imported all primary packages and set our graph style to ‘ggplot’ (grammar of graphics). Apart from ‘ggplot’, you can also use many other styles available in python (Click here to refer styles in python). We will also use ‘cyberpunk’ style for upcoming specific chart types. At last, we are mentioning our charts’ measurements.

Step-2 : Importing and Cleaning Data

This is an important step as a perfect data is an essential need for a perfect visualization. Throughout this article, we will be using a Kaggle dataset on Immigration to Canada from 1980–2013. (Click here for the dataset). Follow the code for importing and cleaning the data.

Python Implementation:

We have successfully imported and cleaned our dataset. Now we are set to do our visualizations using our cleaned dataset.

Step-3 : Creating Beautiful Visualizations

In this step we are going to create 12 different types of Visualizations right from basic charts to advanced charts. Let’s do it!

i) Line Chart

Line chart is the most common chart of all visualizations and it is very useful for the observation of trend and time series analysis. We will start doing it in python with basic single line plot and we’ll proceed with Multiple line chart.

Single Line chart Python Implementation:

Output:

Image by Author

Multiple Line chart Python Implementation:

Output:

Image by Author

All plots are based on ‘ggplot’ style. Now let’s try out Multiple Line Chart using ‘cyberpunk’ style and this style is suitable only for specific chart types. In order to use ‘cyberpunk’ style in python, it is essential to install ‘mplcyberpunk’ package. After installing it, follow the code to produce neon-style plot.

Cyberpunk line chart Python Implementation:

Output:

Image by Author

ii) Bar Chart

Bar Chart is a type of representation mainly used for ranking values. It can easily represented in Python using Matplotlib. We are going to further divide Bar Chart into Vertical bar chart, Horizontal bar chart and Grouped bar chart. There are also many other types but these three are majorly used for visualizations. Let’s do it in Python!

Vertical bar chart Python Implementation:

Output:

Image by Author

Horizontal bar chart Python Implementation:

Output:

Image by Author

Grouped bar chart Python Implementation:

Output:

Image by Author

iii) Area Chart

Like line charts, Area charts are extremely useful for time series analysis. The representation of Area chart is most similar to line chart but the only difference is that Area charts are coloured between spaces. This type of representation is also divided into Simple area chart, Stacked area chart and Unstacked area chart. Let’s dive into the coding section of Area Charts!

Simple area chart Python Implementation:

For this we are going to use ‘df_tot’ dataframe which we created during producing the vertical bar chart.

Output:

Image by Author

We can also produce a simple area chart using the ‘cyberpunk’ plot style which we did before for Multiple line chart. Now let’s do it for Simple Area Chart.

Cyberpunk simple area chart Python Implementation:

Output:

Image by Author

Stacked area chart Python Implementation:

Output:

Image by Author

Unstacked area chart Python Implementation:

Output:

Image by Author

iv) Box Plot

Box plot is often used for Exploratory Data Analysis to get a statistical view of a given dataframe. It also helps us to observe the skewness, distribution and outliers of a data too. We are going to see how to plot Vertical and Horizontal box plot in Python.

Vertical box plot Python Implementation:

Output:

Horizontal box plot Python Implementation:

Output:

Image by Author

v) Scatter Plot

Scatter plot is a representation that displays values pertaining to typically two variables each other. It is very useful to observe relations between the X and the Y variable in the axis. Let’s produce a simple scatter plot using the ‘Iris’ dataset in Python!

Scatter plot Python Implementation:

Output:

Image by Author

vi) Histogram

Histogram is a type of chart which is commonly used for observing the frequency distribution of a given variable. For this type of chart, we are going to use the same iris dataset which used before and Seaborn for better quality. Let’s make a histogram in Python!

Histogram Python Implementation:

Output:

Image by Author

vii) Bubble Plot

This type of chart is most similar to scatter plot but, it represents three dimensions of data. For this chart, we are going produce the values using NumPy’s ‘random’ function and Matplotlib to produce the chart. Let’s do it in Python!

Bubble plot Python Implementation:

Output:

Image by Author

viii) Pie Chart

Pie chart is a circular statistical graphic divided into slices to represent numerical proportions of the given data. Using matplotlib, we can produce beautiful custom pie charts. Let’s produce a pie chart in Python!

Pie chart Python Implementation:

Output:

Image by Author

ix) Doughnut Chart

Doughnut chart is most similar to pie chart but we can use more than one data series to plot but, for our visualization we are going to use only one Dataset which is the Immigration dataset. Let’s do it in Python!

Doughnut chart Python Implementation:

Output:

Image by Author

x) Regression Plot

Regression plots helps data scientists to observe patterns in dataset during Exploratory Data Analysis (EDA) and represents the linear relationships between two variables. It also illustrates the trend between the given ‘X’ and ‘Y’ variables. So, let’s do a Strong trend and a Weak trend regression plot using Seaborn in Python!

Strong trend regression Python Implementation:

Output:

Image by Author

We can observe that the total immigrants to Canada represents a strong trend which means the numbers are increasing year by year. Now, let’s create a Weak trend regression plot.

Weak trend regression Python Implementation:

Output:

Image by Author

It is clear that the total immigrants from Scandinavia (Germany, Norway and Sweden) to Canada fell down year by year hence, it followed a weak trend.

xii) Word Cloud

A word cloud is a visual representation of a text data which illustrates the keywords in it and helps people to easily understand the context of the text data. Unfortunately, Matplotlib don’t have a built-in function to create a word cloud. So, we are going to use the ‘Pywaffle’ package in Python to create a word cloud also, create a text file of an article or essay to make use of it. Let’s do it!

Word cloud Python Implementation:

Output:

Image by Author

From this word cloud chart we observe that the given text file is all about Blockchain and its components like consensus, PoW (Proof-of-Work), hash, block and so on. Awesome!

xiii) Lollipop Chart

This type of chart is way more similar to Bar chart. Lollipop charts help in ranking values and to observe the trend. Creating a lollipop chart is so simple in Matplotlib and let’s do it!

Lollipop chart Python Implementation:

Output:

Image by Author

Final Thoughts!

Finally, we come to end by learning how to create twelve different types of visualizations in Python by making use of various packages like Matplotlib, Seaborn, Pywaffle and so on. But, this isn’t the end. We just covered some of the basic visuals in python and there are much more than you think of like Geospatial visualizations, Networks, Sankey diagram and the list goes on and on. You can find great resources on the internet and many free online courses. Apart from learning, practical implementation is the identity of your knowledge. So, start learning and get your feet wet by getting into the world of Data Science. If you missed any coding sections for any of the chart types, don’t worry I’ve provided the full code for all of the visualizations.

Happy Visualizing!

Full code:

--

--

Nikhil Adithyan
CodeX

Founder @BacktestZone (https://www.backtestzone.com/), a no-code backtesting platform | Top Writer | Connect with me on LinkedIn: https://bit.ly/3yNuwCJ