The world’s leading publication for data science, AI, and ML professionals.

Manipulating Values in Pandas DataFrames

Know when to use map(), apply(), and applymap()

Photo by Annie Spratt on Unsplash
Photo by Annie Spratt on Unsplash

One of the common tasks in data analytics is manipulating values in your dataframes. For those of you who are familiar with Pandas, you know that in general you have three functions that you can use – map(), Apply(), and applymap(). However, it is not often clear when you should use which, and sometimes it can be quite confusing how each function works.

This article attempts to make it clear to you how each function works, and when you should use what.

Quick Summary of the various functions

First, let’s go through a quick definition of what each function does:

  • map() – apply a function to a Series
  • apply() – apply a function to a Series or Dataframe (along one of the two axes)
  • Applymap() – apply a function to a Dataframe element-wise

The best way to remember how they work is to summarize them using the following table:

As you can see, the map() function only applies to a Series and the applymap() function only applies to a Dataframe, while you can use the apply() function on either a Series or Dataframe.

Rather than explain the use of each function, let’s approach this by stating the problem that you are trying to solve.

Modifying a single column of values

If you want to modify a single column of values in a Dataframe, it is easy to do that with the Map() function. Consider the following AAPL dataset (https://finance.yahoo.com/quote/AAPL/history/):

import pandas as pd
df = pd.read_csv('AAPL.csv')
df

Suppose you want to reformat the date so that they are now displayed in the following format:

Dec 24, 2018 (Mon)

An easy way would be to get the Date column as a Series and then use the map() function on it. The following code snippet converts the Date values to the format that we want:

from datetime import datetime
def format_date(x):
    return datetime.strptime(x, '%Y-%m-%d').strftime(
        '%b %d, %Y (%a)')
df = pd.read_csv('AAPL.csv')
df['Date'] = df['Date'].map(format_date)
df

In the format_date() function, the argument x is of type string (each value of the Date column. e.g. "2018–12–24"). To convert to the date format we want, I first use the strptime() to convert the given string to a datetime object, and then convert the datetime object to the desired string format using the strftime() function. The result of the map() function (which is a Series) is then used to update the Date column.

The updated dataframe now looks like this:

You can also use the apply() function for the same purpose, since it also works with series. The following code snippet shows how the above steps can be achieved using the apply() function:

df = pd.read_csv('AAPL.csv')
df['Date'] = df['Date'].apply(format_date)   
df

Modifying multiple columns of values in a dataframe

Suppose now you also need to round down the values in the Close column to 2 decimal places, in addition to changing the date format. In this case, it would be useful to use the apply() function, as it works on dataframes.

The following code snippet shows how you can modify the date format of values in the Date column as well as performing rounding on the Close column:

from datetime import datetime
def format_date2(x): 
    x['Date'] = datetime.strptime(
        x['Date'], '%Y-%m-%d').strftime('%b %d, %Y (%a)')
    x['Close'] = round(x['Close'],2)
    return x
df = pd.read_csv('AAPL.csv')
df = df.apply(format_date2, axis=1)   
df

This time, you use the apply() function on a dataframe. In the format_date2() function, the argument x will now be a Series (containing each row, since we specified axis=1).

In the function we first modify the Date values, then perform rounding on the Close values. The result will look like this:

When you use the apply() function on a Dataframe, you can specify the axis parameter (0 or 1). However, the axis parameter is not supported when you use the apply() function on a Series.

When you use the apply() function on a dataframe with the axis set to 1, you can think of it as going through each of the rows in your dataframe and then getting all the columns as a Series, as the following figure illustrates.

On the other hand, if you specify the axis as 0, it is now going through each of the columns in your dataframe and then getting all the rows as a Series, as the following figure illustrates:

Combining Columns in a Dataframe

Here is another dataset that I will use to illustrate the use of apply() function on a Dataframe. This time I am going to use the rainfall dataset from http://www.weather.gov.sg/climate-historical-daily/.

df = pd.read_csv('DAILYDATA_S24_202107.csv')
df

The dataframe looks like this:

This dataset contains the amount of rainfall in Singapore (plus other features like temperature and wind speed) for each of the 31 days in July, 2021.

Suppose I want to combine the three separate columns – Year, Month, and Day into a single column – Date. This is a good time to use the apply() function:

import datetime
df['Date'] = df[['Year','Month','Day']].apply(
    lambda x: datetime.date(                    # x is a series
        int(x['Year']),
        int(x['Month']),
        int(x['Day'])), 
    axis=1)
df

You first extract the three columns (Year, Month, and Day) that you want as a dataframe, and then use the apply() function on it. When you specify axis=1, each row of the dataframe (with the three columns – Year, Month, and Day) will be passed into the lambda function as a series (x). You can now use the values in the Series to create a datetime.date object. A new column (Date) is now created in the dataframe to store the newly created datetime.date object (the last column in the dataframe):

Aggregation of Columns or Rows Values

Sometimes it is necessary to aggregate the values in specific columns in a Dataframe. For example, using the rainfall dataset, you might want to calculate the mean daily rainfall for the month. In this case, you can use the apply() function with the axis parameter set to 0:

df[['Daily Rainfall Total (mm)']].apply(lambda x: x.mean(),
    axis=0)

When you specify the axis=0 (or omit it as this is the default value) for the apply() function, each column in the dataframe will be passed into the lambda function as a Series:

Say you want to get the average rainfall as well as the average temperature for the entire month. You can first extract the two columns for rainfall and temperature and then use the apply() function:

df = pd.read_csv('DAILYDATA_S24_202107.csv')
df[['Daily Rainfall Total (mm)','Mean Temperature (°C)']].apply(
    lambda x: x.mean(),
    axis=0)

The result will be a Series:

Daily Rainfall Total (mm)     6.316129
Mean Temperature (°C)        28.619355
dtype: float64

What if you want to calculate the average for the rainfall and median for the temperature? In this case, you can identify each column using the name property of the Series. The following example shows you how:

def aggregate_columns(x):
    if x.name =='Daily Rainfall Total (mm)':
        return x.mean()
    if x.name=='Mean Temperature (°C)':
        return x.median()

df[['Daily Rainfall Total (mm)','Mean Temperature (°C)']].apply(
    aggregate_columns,
    axis=0)

The result will be as follows:

Daily Rainfall Total (mm)     6.316129
Mean Temperature (°C)        29.200000
dtype: float64

Transforming elements in a dataframe

So far our discussions have been centered on performing row-wise or column-wise operations on our dataframe. However, there are times where you don’t really care about the order in which your values are computed, as long as they are computed. A good example that we can use to illustrate this is the AAPL dataset we used earlier. Say you need to round all the numbers in the Open, High, Low, Close, and Adj Close columns to 2 decimal places:

Does it matter whether we process them row-wise or column-wise? Nope. As long as they can be rounded to 2 decimal places we are happy. This is a perfect example of using the applymap() function. The applymap() function will go through each of the elements in the dataframe and perform the specified function. The following code snippet shows how this is done:

df = pd.read_csv('AAPL.csv')
df[['Open','High','Low','Close','Adj Close']] = 
    df[['Open','High','Low','Close','Adj Close']].applymap(
        lambda x: round(x,2))   
df

Each element in the selected dataframe will be passed as argument into the lambda function.

The values in the specified columns are now rounded to 2 decimal places:

Conclusions

I hope this article has made it clear for you to decide when you should use the map(), apply(), or applymap() functions. In summary:

  • If you want to modify a single column in a dataframe, use map()
  • If you want to modify several columns in a dataframe at once, use apply()
  • If you want to perform aggregate functions on columns or rows in a dataframe, use apply()
  • If you want to modify the values in a dataframe without worrying whether it is performed row-wise or column-wise, use applymap()

Related Articles