The world’s leading publication for data science, AI, and ML professionals.

3 Pandas Functions You Should Be Using More Often

Stop reinventing the wheel. Seriously.

There’s no denying that Pandas is a powerful library. If you are a daily Python user, it might just be your favorite library for data analysis and some quick visualizations.

Photo by Stan W. on Unsplash
Photo by Stan W. on Unsplash

With that being said, it’s very easy to get used to some core concepts of Pandas and to neglect further exploration of its capabilities. I mean, who hasn’t been guilty of reinventing the wheel? While it’s not necessarily a bad thing (because you know exactly what your code should do), often there are smarter ways to achieve the same.

Today I want to share with you 3 functions that I wasn’t using as much as I should so far, and I also want to show you what exactly I mean by reinventing the wheel. A couple of weeks back I’ve written a similar article, only related to pure Python. If you are someone who reinvents the wheel much, don’t hesitate to check it out:

3 Advanced Python Functions for Data Scientists

Okay, enough talk, let’s dive into some Pandas.


The Dataset Used

As with many of my articles, I’ll be using the famous Titanic dataset. I’m sure you know how to import it, so here’s what you should have before proceeding:

Are we on the same page? Good, let’s proceed.


1. idxmin() and idxmax()

In a nutshell, those functions will return the ID (the index position) of the desired entry. In the Titanic example, maybe you want to find the index position of the youngest/oldest person. Let’s go even further, you’re interested only in the name of that person.

A typical programmer approach would be to reinvent the wheel because you aren’t aware that idxmin() and idxmax() exist. Here’s one approach:

You’re basically subsetting and returning the value for the desired attribute. Not the cleanest of code, but could be worse. Here’s how to achieve the same by utilizing previously mentioned functions:

Yes, the results are identical, obviously, but the code is a bit shorter and much cleaner. Let’s proceed to the next one.


2. cut()

In the fewest words possible, you should use cut() when wanting to bin values into discrete intervals. For example, the Titanic dataset has this Age attribute, which is continuous. In your analysis, you maybe want to calculate the ratio of Survived/Died for some age group and not age as a scalar value.

I won’t even try to reinvent the wheel with this function, let’s dive straight to the point. The idea is to bin Age column into 5 buckets:

Not a fan of those stock labels? Not a problem:

Or you don’t care about the labels what so ever and want plain integer representation:

Whichever you choose, you can assign the whole expression to a new column in the DataFrame and continue with your analysis.


3. pivot_table()

Pivot tables create a spreadsheet-style pivot table as a DataFrame. If you’ve used Excel before I am in no doubt that you’ve used them. Needless to say, they can be implemented in Python without much effort.

Let’s say you want to find out the average survival rate among males in the third class. You could use default Pandas and it’s filtering capabilities to obtain this information:

But what if you want to find the average survival rate for both males and females? In all 3 passenger classes? Pivot tables to the rescue. You can simply set the index to Sex, columns to Pclass (passenger class) and values to Survived, and then use any aggregate function you want, but let’s stick to the mean:

Well, that was easy. One more thing to note, you can also add an additional row for grand totals:


Before you go

Those three functions should set you on the right track. The ultimate goal is to spend more time thinking about what is an optimal thing to do, and not spending much time on the implementation itself.

Whenever you feel your code is longer than it should be, use google to find if there’s a shorter way. Most probably there already exists a function for doing what you need, but you were unaware of its existence.

Thanks for reading. Feel free to share your thoughts and comments below.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Related Articles