The world’s leading publication for data science, AI, and ML professionals.

A Simple Guide to Inplace Operations in Pandas

Introduction to inplace operations in Pandas, exploring commonly supported methods and a common misconceptions

Photo by Sigmund on Unsplash
Photo by Sigmund on Unsplash

Inplace assignment operations are widely popular in transforming Pandas DataFrames. As the name suggests, the core idea behind inplace assignment is to avoid creating a new DataFrame object with each successive modification but instead making changes to the original DataFrame itself.

Inplace and Standard Assignment Operation (Image by author)
Inplace and Standard Assignment Operation (Image by author)

Inplace assignment operations are especially useful in applications with extreme memory constraints. This is because modifications are made to an existing DataFrame (or the source DataFrame) without creating any intermediate DataFrames.

This post is an introduction to inplace operations, specifically on Pandas DataFrames. We will discuss how inplace operations differ from the standard assignment operations. Towards the end, I will present some of the most common methods that support inplace assignments in Pandas.

The highlight of the article is as follows:

Introduction to Inplace Assignment Common Misconception about Inplace Assignment Popular Functions that Support Inplace Assignment Run-time Comparison Conclusion

Let’s begin 🚀 !


Introduction to Inplace Assignment

Once we load a DataFrame into the Python environment, we typically perform a wide range of transformations on the DataFrame, don’t we? These include adding new columns, renaming headers, deleting columns, altering cell values, replacing NaN values, and many more.

These can be typically performed in two ways, as depicted in the image below:

Categorization of DataFrame Manipulation in Pandas (Image by author).
Categorization of DataFrame Manipulation in Pandas (Image by author).

Standard Assignment

If the applied transformation returns a new copy of the DataFrame, it is called a "non-in-place" or "Standard Assignment" operation in Pandas.

By default, Pandas always resorts to standard assignment and returns the modified copy of the DataFrame, leaving the original DataFrame untouched.

As you may have guessed, this will practically take up additional space in the memory, which can eventually lead to memory constraints. The standard assignment operation is demonstrated in the image below and implemented in the code block following the image:

Creating a new DataFrame from a given DataFrame (Image by author).
Creating a new DataFrame from a given DataFrame (Image by author).

As a result of the standard assignment operation of filling NaN values shown above, two distinct Pandas objects will exist in the scope of the program – df and df_copy.

Inplace Assignment

In contrast to the standard/traditional assignment, if the intermediate DataFrame generated (such as df_copy above) is of no use to you, performing an inplace assignment is the ideal way to proceed. This is demonstrated below:

Performing in-place operation (Gif by author)
Performing in-place operation (Gif by author)

On a side note, you should know that inplace assignment exists in numerous functions outside Pandas as well. For instance, when you append an element to a Python list using the append() method, it is an inplace operation because the element is appended to the source list. Similarly, the sort() method on a Python list also performs inplace sorting.

Inplace operations in Pandas require the inplace argument to be passed as True to the method being invoked. For instance, we can replace the NaN values with 0 in the above DataFrame inplace as follows:

Common Misconception about Inplace Assignment

Here, one may argue that we can optimize the memory by assigning the new DataFrame back to the name of the original DataFrame as follows:

Although there’s nothing wrong with the above assignment operation syntactically, you should understand that taking this route inevitably creates a new DataFrame during the assignment operation, leading to an increase in memory utilization temporarily.

We can verify the creation of a new DataFrame object using the id() method in Python as follows:

The ID (or address) of the source DataFrame and that of the new DataFrame are different, implying the creation of a new instance of a Pandas DataFrame.

On the flip side, if you perform an inplace operation with the inplace=True argument, it does not create a new DataFrame. The modifications are made to the original DataFrame itself. This can be verified as follows:

Popular Functions that Support Inplace Assignment

There are a handful of methods in Pandas that are inherently compatible with inplace operations. A few of the most commonly used functions are demonstrated below:

1 Renaming DataFrame headers

Every column has a specific column header that defines the column’s name. You can change the name of the some/all columns as shown below:

Renaming headers of a DataFrame (Image by author)
Renaming headers of a DataFrame (Image by author)

To rename column(s) of a DataFrame, you can use the [rename()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html) method in Pandas as shown below:

The standard assignment returns a new DataFrame. Therefore, you must assign it to a variable. However, as the inplace assignment modifies the source DataFrame itself, it does not return anything.

2 Deleting column(s)

Say you want to remove unwanted column(s) from an existing DataFrame, as illustrated in the figure below:

Deleting columns from a DataFrame (Image by author)
Deleting columns from a DataFrame (Image by author)

To delete column(s) of a DataFrame, you can use the [drop()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html) method in Pandas as shown below:

You should pass the columns you want to drop as a list of column names to the columns argument of the drop() method.

3 Sorting a DataFrame

The sorting operation is another widely used method in Pandas. As the name suggests, the idea here is to sort a DataFrame based on the values in one or more columns, as shown below:

Sorting a DataFrame (Image by author)
Sorting a DataFrame (Image by author)

You can use the [sort_values()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html) method to sort a DataFrame as shown below:

4 Replacing NaN values

The presence of missing values is inevitable in real-world datasets. With Pandas, you have a range of methods to handle missing data and replace them with definite values, as shown below:

Replacing NaN values in a DataFrame (Image by author)
Replacing NaN values in a DataFrame (Image by author)

To replace NaN values, use the [fillna()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html) method as demonstrated below:

5 Reset Index, and many more.

If you want to reset the index values of the DataFrame, you can use the [reset_index()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html) method in Pandas. The image below demonstrates this operation:

Resetting Index of a DataFrame (Image by author)
Resetting Index of a DataFrame (Image by author)

Run-time Comparison

Next, we shall experimentally compare the run-time performance of the standard and the inplace assignment operations on the five Pandas functions discussed above – drop(), rename(), fillna(), reset_index() and sort_values().

The following code-block demonstrates my implementation to measure the run-time:

To give you a gist, the above implementation finds the average run-time of the standard assignment and the inplace assignment by iteratively performing these operations on a dummy DataFrame df.

To find the run-time performance of other methods, we can replace the reset_index() method with the method of interest (drop(), rename(), fillna() and sort_values()).

The bar-plot below depicts the results obtained.

Run-time comparison of Standard and Inplace Assignment (Image by author)
Run-time comparison of Standard and Inplace Assignment (Image by author)

In the above bar plot, I have considered the standard assignment as a reference point and scaled the performance of the inplace assignment accordingly. In other words, if the standard assignment took 100 seconds to execute, each yellow bar above represents the scaled run-time corresponding to the inplace assignment method.

Overall, the Inplace assignment operation always performed better than the standard operation.


Conclusion

To conclude, in this post, we discussed the core idea behind inplace operations in Pandas and how they differ from the standard assignment operations. Furthermore, I presented some of the most common methods that support inplace assignments in Pandas.

Although inplace operations are widely popular in Pandas, there are both pros and cons associated with them that you should know. In the interest of time, I have not discussed them in this post and will release another blog soon!

Thanks for reading!


🧑‍💻 Become a Data Science PRO! Get the FREE Data Science Mastery Toolkit with 450+ Pandas, NumPy, and SQL questions.

✉️ Sign-up to my Email list to never miss another article on data science guides, tricks and tips, Machine Learning, SQL, Python, and more. Medium will deliver my next articles right to your inbox.


Related Articles