
Up till this point, I have talked about how to use the Polars DataFrames, and why it is a better DataFrame library compared to Pandas. Continuing with our exploration of Polars, in this article I will show you how to manipulate your Polars DataFrame, specifically:
- How to change the values for each column/row
- How to sum up the values for each column/row
- How to add a new column/row to the existing dataframe
Ready? Let’s go!
Creating the Sample DataFrame
Let’s create a Polars DataFrame using a list of tuples:
import polars as pl
matrix = [
(1, 2, 3),
(4, 5, 6),
(7, 8, 9),
(10, 11, 12),
(13, 14, 15),
(16, 17, 18)
]
df = pl.DataFrame(matrix, columns=list('abc'))
df
The dataframe looks like this:

Let’s examine some of the methods that you can call to manipulate the values in the dataframe.
Using the apply() Method
The Apply()
method can used on:
- an individual column in a Dataframe, or
- an entire dataframe
Applying on columns
For example, say you want to multiply all the values in the ‘a’ column by 2. You can do the following:
df.select(
pl.col('a').apply(lambda x: x*2)
)
All the values in column ‘a‘ will now be multiplied by 2:

In the lambda function above, x
will take on the individual values of the column a. When applied to a column, the apply()
method sends in the values for a column one-by-one. This provides the opportunity for you to examine each value before deciding how you want to change the values. For example, you can multiply only those values which are greater or equal to 5:
df.select(
pl.col('a').apply(lambda x: x*2 if x>=5 else x)
)
This will produce the following output:

In general, implementing logic using the
apply()
method is slower and more memory intensive than implementing your logic using expressions. This is because expressions can be parallelized and optimized, and the logic implemented in an expression is implemented in Rust, which is faster than its implementation in Python (implemented in a lambda function, for example). So, whenever possible, use expressions instead of using theapply()
function. As an example, the earlierapply()
method can also be rewritten using an expression:
pl.col('a').apply(lambda x: x*2)
# rewritten as an expression
pl.col('a') * 2
Notice that the result only contains a single column. If you want the rest of the columns to be in the result as well, use the select()
and exclude()
methods:
q = (
df
.lazy()
.select(
[
pl.col('a').apply(lambda x: x*2),
pl.exclude('a')
]
)
)
q.collect()
Now the result contains all the columns:

If you want to multiply all columns by 2, select all columns using pl.col('*')
:
q = (
df
.lazy()
.select(
pl.col('*').apply(lambda x: x*2)
)
)
q.collect()
All the columns would now be multiplied by 2:

If you want to multiply column ‘a’ by 2 and then store the result as another column, use the alias()
method:
q = (
df
.lazy()
.select(
[
pl.col('*'),
pl.col('a').apply(lambda x: x*2).alias('x*2'),
]
)
)
q.collect()
The result would now have an additional column:

Using the map() method
Another function that is similar to the apply()
method is the Map()
method. Unlike the apply()
method, the map()
method sends in the values of a column as a single Polars Series:
df.select(
pl.col('a').map(lambda x: x*2)
)
In the lambda function above, x
is a Polars Series containing the values of the column a. The above statement produces the following output:

Applying on rows
Observe that so far the apply()
method is applied to columns in a dataframe. What if you want to apply to rows in a dataframe? In this case, call the apply()
method on the dataframe directly.
To understand how it works, I wrote a test
function to print out the value that it gets when the apply()
function is applied to the dataframe:
def test(x):
print(x)
return x
df.apply(test)
It returns the following:
(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10, 11, 12)
(13, 14, 15)
(16, 17, 18)
This means that the apply()
function, when applied to a dataframe, sends the values of each row as a tuple to the receiving function. This is useful for some use cases. For example, say you need to perform an integer division of all the numbers in a row by 2 if the sum of them is greater than 10, then you can write the lambda function as:
df.apply(lambda x: tuple([i // 2 for i in x]) if sum(x) > 10 else x)
And the result will look like this:

If you want to duplicate all the columns in the dataframe, you can also use the apply()
method:
df.apply(lambda x: x*2)
The dataframe now has six columns:

Note that the
apply()
function cannot be applied to a LazyFrame.
Summing up values in the DataFrame
Often, you need to sum up all the values in your dataframe either row-wise, or column-wise.
By column
The easiest way to sum up the values for each column is to use the sum()
method on the dataframe:
df.sum()

To append the result above to the existing dataframe, use the concat()
method:
pl.concat([df, df.sum()])

By row
To sum up the values of all the columns for each row, use the sum()
method with the axis
parameter set to 1
:
df.sum(axis=1)
The result is a Polars Series:

Think of a Polars Series as a single column in a dataframe
You can also use the select()
method to select the columns that you want to sum up:
df.select(pl.col('*')).sum(axis=1)
The following code snippet adds the series to the dataframe as a new column:
df['sum'] = df.select(pl.col('*')).sum(axis=1)
df

If you do not want to use square bracket indexing (which is not recommended in Polars), use the select()
method instead:
df.select(
[
pl.col('*'),
df.select(pl.col('*')).sum(axis=1).alias('sum')
]
)
Join Medium with my referral link – Wei-Meng Lee
I will be running a workshop on Polars in the upcoming ML Conference (22–24 Nov 2022) in Singapore. If you want a jumpstart on the Polars DataFrame, register for my workshop at https://mlconference.ai/machine-learning-advanced-development/using-polars-for-data-analytics-workshop/.

Summary
I hope this article added some ammunition to your arsenal for working with your Polars DataFrames. Here is a quick summary of when to use the apply()
and map()
methods:
- Call the
apply()
method on an expression to apply a function to individual values in a column(s) in a dataframe. - Call the
map()
function on an expression to apply a function to a column(s) as a Series in a dataframe. - Call the
apply()
method on a dataframe to apply a function to _row_s in a dataframe.
Save this article and use it as a quick reference the next time you work with your Polars DataFrame!