The world’s leading publication for data science, AI, and ML professionals.

Pandas Apply for Power Users

Become a power user by learning Pandas built-in apply() function

Photo by Todd Trapani on Unsplash
Photo by Todd Trapani on Unsplash

Introduction

At some point in our Data Science career, we are going to come across poor quality data, whether it be partially completed records or incorrectly formatted attributes. Being able to manage poor quality data has become a crucial skill as a successful data scientist.

Thankfully, there are numerous libraries that have been developed such as Pandas that we can use to efficiently manipulate datasets. Today we are going to look at utilizing Pandas built-in function .apply() to learn how to manipulate a dataset.

Pandas Apply Function

The Pandas apply function does at its name suggests, it allows you to apply a function across an axis of a DataFrame or to a Series. This function is incredibly powerful when needing to manipulate a column or row in the case of reformatting or cleaning data.

The first parameter func allows you to pass in a function. This can either be a function from an imported package, a function you’ve declared in your script or even a lambda function. The below Python snippet demonstrates how to pass in the three different types of functions between lines 21–23. Here we are creating three new columns generated by Pandas .apply() function by passing three different functions yielding the same result.

Console output showing the results of executing the above snippet.
Console output showing the results of executing the above snippet.

The second parameter axis allows you to define which axis the function is going to be applied to. This can either be 0 or index for the rows or 1 or columns for the columns within the DataFrame. Personally, I prefer to use index and columns as it improves the readability of your code for those that don’t understand the meaning behind 0 and 1. The third parameter raw determines whether a row or column is passed or an n-dimensional arrary (ndarray) object. If raw=False is passed then the row or column is passed to the apply function as a Series, if raw=True is passed then ndarray objects are passed to the function instead.

The parameter result_type alters how the apply function is applied to the columns axis of a DataFrame. By default the result_type is set to None however, can accept expand broadcast and reduce. If expand is passed any list type results that are returned will be expanded across the columns. If the length of the returned lists is greater than the number of columns in the DataFrame then additional columns will be created to expand across. During the expansion of the lists, the original column names of the DataFrame will be overridden to a range index. If the original shape of the DataFrame needs to be retained then broadcast can be passed, this will also ensure the retention of the original column names. If you need to ensure that list type results aren’t expanded than you can pass reduce to result_type.

The final parameter args=() allows you to pass in values to the function that you are applying. In the example below, we are passing 25 and 75 as our lower and higher parameters for the function between_range(). As the function is being applied to the column, each value in the result series will be assessed as to whether it is inside or outside the value range and a boolean will be returned within the in_range column.

Console outputs showing the result of calling between_range() within args=().
Console outputs showing the result of calling between_range() within args=().

Summary

Pandas built-in apply() function is an incredibly powerful tool to know and understand when it comes to dealing with poor quality data. It provides an efficient way to apply a function along an axis of a DataFrame to clean or manipulate your data. The apply() function with the flexibility of parameters you can pass will allow you to tackle almost any data quality issue.

Thank you for taking the time to read our story, we hope you have found it valuable 🙂


Related Articles