The world’s leading publication for data science, AI, and ML professionals.

4 Pandas Functions for Element-Wise Comparison of DataFrames

Explained with examples.

Photo by NordWood Themes on Unsplash
Photo by NordWood Themes on Unsplash

Pandas Dataframes are two dimensional data structures with labeled rows and columns.

DataFrame with 3 rows and 3 columns (image by author)
DataFrame with 3 rows and 3 columns (image by author)

We sometimes need to do an element-wise comparison of two DataFrames. For example:

  • Update values in a DataFrame using the values in another one.
  • Compare values and pick the bigger or smaller value.

In this article, we’ll learn four different Pandas functions that can be used for such tasks. We’ll also do examples to better understand the difference as well as similarities among them.


If you’d like to learn more about Pandas, visit my course 500 Exercises to Master Python Pandas.


Let’s first create two DataFrames to be used in the examples.

import numpy as np
import pandas as pd

# create DataFrames with random integers
df1 = pd.DataFrame(np.random.randint(0, 10, size=(4, 4)), columns=list("ABCD"))
df2 = pd.DataFrame(np.random.randint(0, 10, size=(4, 4)), columns=list("ABCD"))

# add a couple of missing values
df1.iloc[2, 3] = np.nan
df1.iloc[1, 2] = np.nan
(image by author)
(image by author)

1. combine

The combine function does an element-wise comparison based on the given function.

For instance, we can select the maximum value out of two values for each position. It’ll be more clear when we do the example.

combined_df = df1.combine(df2, np.maximum)
(image by author)
(image by author)

Take a look at the value in the first row and first column. The combined DataFrame has the bigger one of 5 and 2.

If one of the values is NaN (i.e. missing value), the combined DataFrame at this position has NaN as well because Pandas can’t compare a value with a missing value.

We can choose a constant value to be used in the case of missing values by using the fill_value parameter. Missing values are filled with this value before comparing them to the values in the other DataFrame.

combined_df = df1.combine(df2, np.maximum, fill_value=0)
(image by author)
(image by author)

There are two NaN values in df1, which are filled with 0 and then compared to the values in the same position of df2.


2. combine_first

The combine_first function updates NaN values with the values in the same position of the other DataFrame.

combined_df = df1.combine_first(df2)
(image by author)
(image by author)

As we see in the screenshot above, combined_df has the same values as df1 except for the NaN values, which are filled with values from df2 .

It is important to note that the combine_first function does not update the values in df1 and df2 . It only returns an updated version of the first DataFrame.


3. update

The update function updates the missing value in a DataFrame using the values in the same location of another DataFrame.

It sounds the same thing as what the combine_first function does. However, there is an important difference.

The update function does not return anything but updates in place. Thus, the original DataFrame is modified (or updated). It’ll be more clear with an example.

We have two DataFrames as shown below:

(image by author)
(image by author)

Let’s use the update function on df1 .

df1.update(df2)

This line of code does not return anything but it updates df1 . The updated version is:

(image by author)
(image by author)

df1 does not include missing values anymore, which have been updated using the values from df2 .


4. compare

The compare function compares the values at the same location and returns a DataFrame showing them side-by-side.

comparison = df1.compare(df2)
(image by author)
(image by author)

If the values at a particular location are the same, the comparison shows them as NaN (e.g. second row, first column). We can change this behavior by using the keep_equal parameter.

comparison = df1.compare(df2, keep_equal=True)
(image by author)
(image by author)

Conclusion

We learned four different Pandas functions that perform element-wise comparison of values in two DataFrames. They all have different purposes. Some are used for updating values whereas some just do a comparison.

There will be cases where a particular one of these functions is appropriate to use. Hence, it’s best to get to know all of them.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

Thank you for reading. Please let me know if you have any feedback.


Related Articles