The world’s leading publication for data science, AI, and ML professionals.

5 Must-Know Pandas Functions for Data Science

Pandas is being utilized in most data science projects

Every Data Science project starts with data analysis. When we talk about data analysis, pandas is the most valued player. Pandas is a python library – also known for Panel Data Analysis.

In this article, I will share some handy pandas functions that you must know. These are the most useful functions to deal with different operations in the datasets.


The dataset for this article that I will refer to is Kaggle’s house price prediction data. You can download it from here.

Let’s have a look at our data first.

import pandas as pd
df = pd.read_csv("House data.csv")

This is how our data looks like. As this is house price prediction data – we have bedrooms, bathrooms, floors and other factors that can help us decide the price of the house with any specifications.

Let’s now apply some pandas functions to this data.

1. Count() Function

Let’s say you want to quickly check if there are any null values in your table. In that case, the count function provides us with the count of cells that have value in them.

df.count()

Great news, we have no null values in our dataset. So, let’s assign a null value and see the changes.

df.at[0,'price']= np.nan

Now, if I will check the count – I will get the below result.

2. idxmin() and idxmax() functions

These functions return the index of the particular row where the desired condition is met.

Let’s say to want to get the details of the house where the price is minimum. There can be many ways by applying the data subsetting method. But, the most efficient way is to use these functions.

df.loc[df['price'].idxmin()]

By running the above code – I can get the details of the house that it is having the minimum price, as shown below.

So, we are getting a house with three bedrooms in the Federal Way city at zero price. 😁

I know this is the data error as we are playing with open source dummy data. But, I think you got the things. 🙂 The same way we can use the idxmax() to get the maximum price house.

What if? You have more than one house with the minimum or maximum price. In that case, these functions will return the first occurrence. In the further article, we will see how we can tackle this case.😉

3. cut() Function

Let’s say you have a variable with continuous values. But, as per your business understanding, this variable should be treated as a categorical variable.

The cut() function can help you bucket your continuous variable by sorting them and then making data range buckets out of them.

In this data, I want to make a bucket of price data as price value ranges from 0 to 26590000. If I can bucket it, then decision making can be a bit easier.

pd.cut(df["price"], 4)

You can also assign labels to each bucket as shown below.

Looks good! Right? We can either replace the price column with this or can create a new fresh column.

4. pivot_table()

Every excel person must have used this function in their data. We can do the same with pandas.

Let’s say we want to find the average price of the house in each city based on the different bedrooms.

df.pivot_table(index="city" , columns="bedrooms" ,values="price" , aggfunc="mean")

Here you can find null values, as it is not necessary – every city has two bedrooms. It depends on the data.

5. nsmallest() and nlargest() functions

We have seen how we can use the idxmin(), and idxmax() functions to get the minimum and maximum observations.

What if? You want to get the top 3 maximum price house data. In that case, these functions can save our time.

df.nlargest(3, "price")[["city","price"]]

df.nsmallest(3, "price")[["city","price"]]

Here we go! We now have three cities with the house that have zero price. 🙂


Conclusion

Well, those were some amazing pandas functions. These functions can be much handy in your day to day data science tasks.

I hope you liked the article. Stay tuned for more exciting articles!

Thanks for the reading!


Here are some of my best picks:

https://betterprogramming.pub/10-python-tricks-that-will-wow-you-de450921d96a

https://towardsdatascience.com/7-amazing-python-one-liners-you-must-know-413ae021470f

https://towardsdatascience.com/5-data-science-projects-that-you-can-complete-over-the-weekend-34445b14707d


Before you go…

If you liked this article and want to stay tuned with more exciting articles on Python & Data Science – do consider becoming a medium member by clicking here https://pranjalai.medium.com/membership.

Please do consider signing up using my referral link. In this way, the portion of the membership fee goes to me, which motivates me to write more exciting stuff on Python and Data Science.

Also, feel free to subscribe to my free newsletter: Pranjal’s Newsletter.


Related Articles