
One of the reasons why Python dominates Data Science is the rich selection of libraries it offers to the users. The active Python community keeps maintaining and improving these libraries which helps Python to stay on top.
Some of the most commonly used Python libraries for data science are Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow, and PyTorch. They can be considered as the FAANG of the Python library ecosystem.
Just like there are many successful companies other than the FAANG, Python has other libraries that come in handy in particular cases. In this article, I will inform you about 3 of them.
Altair
Altair is a statistical visualization library for Python. It is not as popular as Seaborn or Matplotlib but I suggest you give Altair a chance as well.
What I like best about Altair is the filtering and data transformation operations. It provides many options to manipulate data while creating a visualization. In this sense, Altair can be considered as a more complete exploratory data tool.
We can also create interactive visualizations with Altair. Furthermore, it is possible to add selection objects to the visualizations in a way that what you select on one chart makes changes on another one. Cool feature! 🙂
The following interactive visualization was created with Altair. The one on the right is a histogram that shows the price distribution of the data points selected on the left plot.

I have written a few articles that explain how to use Altair. They constitute a practical Altair tutorial so I suggest you visit them if you’d like to learn more about Altair.
- Part 1: Introduction
- Part 2: Filtering and transforming data
- Part 3: Interactive plots and dynamic filtering
- Part 4: Customizing visualizations
- Part 5: Making interactive visualizations with Altair
Sidetable
Sidetable is an add-on to the Pandas library. It was created by Chris Moffitt.
Pandas has some accessors for using certain types of methods. For instance, the methods for manipulation strings are accessed using the str accessor. The reason why I’m giving this information is that Sidetable can be used as an accessor on data frames just like the str accessor.
It can be installed from the terminal or in a jupyter notebook.
#from terminal
$ python -m pip install -U sidetable
#jupyter notebook
!pip install sidetable
In order to have fun with Sidetable, we need to import it along with Pandas.
import pandas as pd
import sidetable
What Sidetable does is similar to the value_counts
function of Pandas but it provides much more insight.
When applied to a categorical variable, the value_counts
function gives us the number of observations or percent share for each category. On the other hand, Sidetable not only gives the number of observations and the percent share together, it also provides cumulative values.
Let’s do a simple example to demonstrate the difference. Consider we have the following data frame.

We can find out the number of cars for each brand as follows.

The sidetable returns a more informative table.

This is a sample data frame with only 25 rows. When you work with larger data frames which you are in real life, Sidetable will be much more practical and functional.
In addition to the freq
function, Sidetable has counts
, missing
, and subtotal
functions which are quite functional as well.
If you’d like to learn more about Sidetable, here are two articles with several examples.
Missingno
Missingno, as its name suggests, is a library that helps handling the missing values in a data frame.
Pandas has functions to find the number of missing values or to replace them with appropriate values. What Missingno does is to create visualizations that provide an overview of the distribution of the missing values.
It is definitely more informative than just knowing the number of missing values. It is an important insight for handling the missing values as well.
For instance, if most of the missing values are in the same rows, we can choose to drop them. However, if the missing values in different columns happen to be in the different rows, we should probably find a better approach.
Missingno makes it easier for us to explore the distribution of missing values.
Conclusion
Python is a prominent library in data science for a reason. There are numerous libraries that make your life easier. Many thanks for the great Python community for creating such outstanding libraries.
Last but not least, if you are not a Medium member yet and plan to become one, I kindly ask you to do so using the following link. I will receive a portion from your membership fee with no additional cost to you.
Thank you for reading. Please let me know if you have any feedback.