The world’s leading publication for data science, AI, and ML professionals.

How To Delete Rows From Pandas DataFrames Based on Column Values

Discussing how to delete specific rows from pandas DataFrames based on column values

Photo by Sam Pak on Unsplash
Photo by Sam Pak on Unsplash

Introduction

Deleting rows from Pandas DataFrames based on specific conditions relevant to column values is among the most commonly performed tasks. In today’s short guide we are going to explore how to perform row deletion when

  • a row contains (i.e. is equal to) specific column value(s)
  • a particular column value of a row is not equal to another value
  • a row has null value(s) in a specific column
  • a row has non-null column values
  • multiple conditions (combination of the above) need to be met

First, let’s create an example DataFrame that we’ll reference across this article in order to demonstrate a few concepts that will help us understand how to delete rows from pandas DataFrames.

import pandas as pd
df = pd.DataFrame({
    'colA': [1, 2, 3, 4, None],
    'colB': [True, True, False, False, True],
    'colC': ['a', None, 'c', None, 'e'],
    'colD': [0.1, None, None, None, 0.5],
})
print(df)
   colA   colB  colC  colD
0   1.0   True     a   0.1
1   2.0   True  None   NaN
2   3.0  False     c   NaN
3   4.0  False  None   NaN
4   NaN   True     e   0.5

Delete row(s) containing specific column value(s)

If you want to delete rows based on the values of a specific column, you can do so by slicing the original DataFrame. For instance, in order to drop all the rows where the colA is equal to 1.0, you can do so as shown below:

df = df.drop(df.index[df['colA'] == 1.0])
print(df)
   colA   colB  colC  colD
1   2.0   True  None   NaN
2   3.0  False     c   NaN
3   4.0  False  None   NaN
4   NaN   True     e   0.5

Another approach is to reverse the condition so that you can keep all the rows where colA is not equal to 1.0. For example,

df = df[df.colA != 1.0]

Alternatively, if you want to drop rows where a column value is equal with any other value appearing in -say- a list of numbers, then you can make use of the isin() method. For instance, in order to drop all the rows where the column value of colA is 1.0, 2.0 or 3.0 then the following will do the trick:

df = df.drop(df.index[df['colA'].isin([1.0, 2.0, 3.0])])
print(df)
   colA   colB  colC  colD
3   4.0  False  None   NaN
4   NaN   True     e   0.5

In the same way, you can revert the condition in order to keep only the records that don’t satisfy the condition. For instance,

df = df[~df.colA.isin([1.0, 2.0, 3.0])]

which is equivalent to the previous expression.


Delete row whose column value is not equal to another value

In the same way, you can simply drop rows whose column value is not equal to a specific value. For instance, the following expression will drop all records not having the colA equal to 1.0:

df = df.drop(df.index[df['colA']!=1.0])
print(df)
   colA  colB colC  colD
0   1.0  True    a   0.1

Delete rows with null values in a specific column

Now if you want to drop rows having null values in a specific column you can make use of the isnull() method. For instance, in order to drop all the rows with null values in column colC you can do the following:

df = df.drop(df.index[df['colC'].isnull()])
print(df)
   colA   colB colC  colD
0   1.0   True    a   0.1
2   3.0  False    c   NaN
4   NaN   True    e   0.5

Alternatively, you can reverse the condition and keep all non-null values:

df = df[df.colC.notnull()]
print(df)
   colA   colB colC  colD
0   1.0   True    a   0.1
2   3.0  False    c   NaN
4   NaN   True    e   0.5

Delete rows with non-null values in a specific column

On the other hand, if you want to drop rows having non-null values in a specific column, you can make use of the notnull() method. For instance, to drop all rows with non-null values in column colC you need to run

df = df.drop(df.index[df['colC'].notnull()])
print(df)
   colA   colB  colC  colD
1   2.0   True  None   NaN
3   4.0  False  None   NaN

Once again, you can reverse the condition and keep only null values by using isnull method as shown below:

df = df[df.colC.isnull()]

Delete rows by combining multiple conditions

Now let’s assume that you want to drop rows by combining multiple conditions. For example, say you want to drop all the rows where both colC and colD are null. You can combine multiple expressions using the & operator as shown below.

df = df.drop(df[df['colC'].isnull() & df['colD'].isnull()].index)
print(df)
   colA   colB colC  colD
0   1.0   True    a   0.1
2   3.0  False    c   NaN
4   NaN   True    e   0.5

If you want either of the conditions to be met, then you can use the | operator. In this case, all the rows that have null values in either colC or colD will be dropped from the returned DataFrame.

df = df.drop(df[df['colC'].isnull() | df['colD'].isnull()].index)
print(df)
   colA  colB colC  colD
0   1.0  True    a   0.1
4   NaN  True    e   0.5

Final Thoughts

In today’s guide, we explored how to delete rows from pandas DataFrames based on specific conditions. Specifically, we discussed how to perform row deletion when specific column value(s) of rows are equal (or not equal) to other value(s). Additionally, we discussed how to drop rows containing null or non-null values in specific columns. Finally, we explored how to combine multiple conditions when deleting rows from DataFrames.


Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read.


You may also like

How To Select Rows From Pandas DataFrame Based on Column Values


How to Refine Your Google Search and Get Better Results


Dynamic Typing in Python


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.