The world’s leading publication for data science, AI, and ML professionals.

How to Group-By Pandas DataFrames to Compute the Mean

Computing the mean in pandas using Group By expressions

Photo by Diana Polekhina on Unsplash
Photo by Diana Polekhina on Unsplash

Introduction

When working with pandas DataFrames, we usually need to compute certain metrics for specific groups. The typical way to achieve this is through the traditional group-by expression followed by the relevant calculation we need to compute.

In today’s short tutorial we will be showcasing how to perform Group-By operations over pandas DataFrames in order to compute the mean (aka average) and median values per group.


First, let’s create an example DataFrame in pandas that we will be using throughout this article in order to demonstrate a few concepts and understand what steps we need to follow in order to derive the target result.

import pandas as pd 
df = pd.DataFrame(
    [ 
        (1, 'B', 121, 10.1, True),
        (2, 'C', 145, 5.5, False),
        (3, 'A', 345, 4.5, False),
        (4, 'A', 112, 3.0, True),
        (5, 'C', 105, 2.1, False),
        (6, 'A', 435, 7.8, True),
        (7, 'B', 521, 9.1, True),
        (8, 'B', 322, 8.7, True),
        (9, 'C', 213, 5.8, True),
        (10, 'B', 718, 9.1, False),
    ],
    columns=['colA', 'colB', 'colC', 'colD', 'colE']
)
print(df)
   colA colB  colC  colD   colE
0     1    B   121  10.1   True
1     2    C   145   5.5  False
2     3    A   345   4.5  False
3     4    A   112   3.0   True
4     5    C   105   2.1  False
5     6    A   435   7.8   True
6     7    B   521   9.1   True
7     8    B   322   8.7   True
8     9    C   213   5.8   True
9    10    B   718   9.1  False

Using the mean() method

The first option we have here is to perform the groupby operation over the column of interest, then slice the result using the column for which we want to perform the mathematical calculation and finally call the mean() method.

Now let’s suppose that for each value appearing in column colB, we want to compute the mean value for column colC. The following expression will do the trick for us.

>>> df.groupby('colB')['colC'].mean()
colB
A    297.333333
B    420.500000
C    154.333333
Name: colC, dtype: float64

The result will be a pandas Series containing the mean of colC for each of the values appearing in column colB.


Using the agg() method

In the same way, we can instead make use of the agg() method that can be used to perform aggregations for the specified operation – which in our case will be the mean calculation.

>>> df.groupby('colB')['colC'].agg('mean')
colB
A    297.333333
B    420.500000
C    154.333333
Name: colC, dtype: float64

The result will be exactly the same as the previous approach that we showcased.


Computing the median

In the same way you can use the same strategy in order to computer other metrics such as median, count or sum of computed groups.

In the following example, we use the same approach as we showcased in the first part of this tutorial in order to compute the median of colC for each of the values appearing in column colB.

>>> df.groupby('colB')['colC'].median()
colB
A    345.0
B    421.5
C    145.0
Name: colC, dtype: float64

In the same way you can use other methods such as count() and sum() in order to compute the corresponding metrics.

The same applies for the second approach that involves the agg() method:

>>> df.groupby('colB')['colC'].agg('median')
colB
A    345.0
B    421.5
C    145.0
Name: colC, dtype: float64

Final Thoughts

In today’s article we discussed about one of the most commonly performed operations in pandas that requires us to perform group by operations over the DataFrame of interest.

Additionally, we showcased how to then compute useful metrics such as the mean and median values for the groups of interest. This is of course just an example of a metric value you can compute – in reality the same approach can be used in order to compute counts, sum etc.

How to Group-By Pandas DataFrames to Compute Sum


Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Join Medium with my referral link – Giorgos Myrianthous


Related articles you may also like

How to Filter Pandas DataFrames Using ‘in’ and ‘not in’


US Market Bank Holidays in Python


How To Create User-Defined Iterables in Python


Related Articles