Pandas is a very powerful and versatile Python data analysis library that expedites the preprocessing steps of Data Science projects. It provides numerous functions and methods that are quite useful in data analysis.
Although the built-in functions of Pandas are capable of performing efficient data analysis, custom made functions or libraries add value to Pandas.
Sidetable is one of these add-ons which makes it easier to create summaries of dataframes. It can be considered as a combination of value counts and cross tab functions.
In some cases, sidetable can work as the groupby function. It can also be combined with the groupby function to produce more informative results.
Sidetable was created by Chris Moffitt. It has been quite useful for me in my daily analyses. In this post, I will walk you through examples to show how to best make use of the sidetable.
Once installed, sidetable can be used as an accessor on dataframes just like dt and str accessors. Installation is straightforward.
$ python -m pip install -U sidetable #from terminal
!pip install sidetable #jupyter notebook
We can import it along with Pandas and start using.
import pandas as pd
import sidetable
I will be using direct marketing and US elections datasets for examples. Both datasets are available on Kaggle.


Sidetable provides functions that are used with the stb accessor. The functions we will cover are:
- Freq function
- Counts function
- Missing function
- Subtotal function
Freq function
Freq function returns a dataframe that conveys 3 pieces of information.
- The number of observations (i.e. rows) for each category (value_counts()).
- The percentage of each category in the entire column (value_counts(normalize=True)).
- The cumulative versions of the two above.
Here is an example.
marketing.stb.freq(['Age'])

The "Age" column has three categories (Middle, Young, Old). For each category, we see the number of rows and percentage. The rows in cumulative columns contain these values up to that row. For instance, the second row of cumulative columns shows the count and percentage of the middle and young categories.
The freq function counts the number of rows by default. If we pass another column using the value parameter, it will return the sum of values in that column. Let’s do an example.
marketing.stb.freq(['Age'], value='AmountSpent')

As you can see, the name of the column changed from "count" to the name of the column passed to the value parameter. What we see in the returned table is the sum of the "AmountSpent" column for each category. The other columns contain the data (percentage, cumulative) based on the values in the "AmountSpent" column.
The freq function can also take multiple columns as argument. It is similar to the groupby function with the count method.
marketing.stb.freq(['Age','Gender'])

We have have 6 categories which are the combinations of categories in the "Age"and "Gender" columns. Another useful feature of sidetable is that the values are sorted by default.
We can achieve the same result (except for the cumulative part) with the groupby function.
marketing[['Age','Gender','Salary']]
.groupby(['Age','Gender'], as_index=False)
.count().sort_values(by='Salary', ascending=False)
.rename(columns={'Salary':'count'})

It is clear that sidetable provides a much simpler syntax.
One advantage of having cumulative values is that we can only display the larger categories.
Let’s do an example on the elections dataset. We want to see the total number of votes in the states that constitute the %40 of all votes.
elections.stb.freq(['state'], value='total_votes', thresh=40)

The states are sorted based on the total number of votes. When the cumulative percent reach 0.40, remaining states are represented in one row and labelled as "others". We can change the label name by using the other_label parameter.
Counts function
Another highly useful function of sidetable is the count function. It returns the number of unique values in each column along with some other measures.
- The number of non-missing values in each column
- The number of unique categories in each column
- The most and least frequent categories in each column
- The number of values that belong the most and least frequent columns
Let’s apply it on the marketing dataframe.
marketing.stb.counts()

It is a quite informative table. We can see the number of unique values, the most and least frequent categories.
As you can see, the table includes all the features. We can select a specific data type using the exclude or include parameters. For instance, the following syntax will exclude the numeric columns.
marketing.stb.counts(exclude='number')
Missing function
The missing function is pretty simple. It returns the count and percentage of missing values in each column.
marketing.stb.missing()

This dataframe does not have many missing values. However, it comes in handy when we work with dataframes that contain missing values in most columns.
Subtotal function
The subtotal function is best used with the groupby function of Pandas. It adds a subtotal for levels of the grouping.
Let’s first do a groupby example without the subtotal function of sidetable.
marketing[['Age','OwnHome','AmountSpent']]
.groupby(['Age','OwnHome']).sum()

We have 2 levels and 6 categories as the result of grouping. The levels are the "Age" and "OwnHome" columns. For each category, the sum of the "AmountSpent" column is shown. In some cases, it would be better to also see the sub total for the levels.
Adding subtotals of levels are pretty simple with the sidetable.
marketing[['Age','OwnHome','AmountSpent']]
.groupby(['Age','OwnHome']).sum()
.stb.subtotal()

In addition to the subtotals, we also see the grand total for the aggregated columns.
If we have more than two levels, the subtotals will be added to each level except for the last one. However, it can be changed using the sub_level parameter.
Let’s assume we have 3 levels (Age, OwnHome, Gender) in the groupby function:
- sub_level = 1 : Subtotals for categories in Age column are shown
- sub_level = 2 : Subtotals for categories in OwnHome column are shown
- sub_level = [1,2] : All subtotals are shown.
Conclusion
Sidetable is a great tool to create summary tables which are quite useful in exploratory data analysis. We can also use them to deliver analyses results.
What sidetable offers can also be created using the Pandas own functions and methods. However, the syntax and simplicity of sidetable makes it the first choice for me in many cases.
Thank you for reading. Please let me know if you have feedback.