
Pandas Dataframes are two dimensional data structures with labeled rows and columns.

We sometimes need to do an element-wise comparison of two DataFrames. For example:
- Update values in a DataFrame using the values in another one.
- Compare values and pick the bigger or smaller value.
In this article, we’ll learn four different Pandas functions that can be used for such tasks. We’ll also do examples to better understand the difference as well as similarities among them.
If you’d like to learn more about Pandas, visit my course 500 Exercises to Master Python Pandas.
Let’s first create two DataFrames to be used in the examples.
import numpy as np
import pandas as pd
# create DataFrames with random integers
df1 = pd.DataFrame(np.random.randint(0, 10, size=(4, 4)), columns=list("ABCD"))
df2 = pd.DataFrame(np.random.randint(0, 10, size=(4, 4)), columns=list("ABCD"))
# add a couple of missing values
df1.iloc[2, 3] = np.nan
df1.iloc[1, 2] = np.nan

1. combine
The combine
function does an element-wise comparison based on the given function.
For instance, we can select the maximum value out of two values for each position. It’ll be more clear when we do the example.
combined_df = df1.combine(df2, np.maximum)

Take a look at the value in the first row and first column. The combined DataFrame has the bigger one of 5 and 2.
If one of the values is NaN
(i.e. missing value), the combined DataFrame at this position has NaN
as well because Pandas can’t compare a value with a missing value.
We can choose a constant value to be used in the case of missing values by using the fill_value
parameter. Missing values are filled with this value before comparing them to the values in the other DataFrame.
combined_df = df1.combine(df2, np.maximum, fill_value=0)

There are two NaN
values in df1, which are filled with 0 and then compared to the values in the same position of df2
.
2. combine_first
The combine_first
function updates NaN
values with the values in the same position of the other DataFrame.
combined_df = df1.combine_first(df2)

As we see in the screenshot above, combined_df
has the same values as df1
except for the NaN
values, which are filled with values from df2
.
It is important to note that the combine_first
function does not update the values in df1
and df2
. It only returns an updated version of the first DataFrame.
3. update
The update
function updates the missing value in a DataFrame using the values in the same location of another DataFrame.
It sounds the same thing as what the combine_first
function does. However, there is an important difference.
The update
function does not return anything but updates in place. Thus, the original DataFrame is modified (or updated). It’ll be more clear with an example.
We have two DataFrames as shown below:

Let’s use the update
function on df1
.
df1.update(df2)
This line of code does not return anything but it updates df1
. The updated version is:

df1
does not include missing values anymore, which have been updated using the values from df2
.
4. compare
The compare
function compares the values at the same location and returns a DataFrame showing them side-by-side.
comparison = df1.compare(df2)

If the values at a particular location are the same, the comparison shows them as NaN
(e.g. second row, first column). We can change this behavior by using the keep_equal
parameter.
comparison = df1.compare(df2, keep_equal=True)

Conclusion
We learned four different Pandas functions that perform element-wise comparison of values in two DataFrames. They all have different purposes. Some are used for updating values whereas some just do a comparison.
There will be cases where a particular one of these functions is appropriate to use. Hence, it’s best to get to know all of them.
You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
Thank you for reading. Please let me know if you have any feedback.