Introduction
Deleting rows from Pandas DataFrames based on specific conditions relevant to column values is among the most commonly performed tasks. In today’s short guide we are going to explore how to perform row deletion when
- a row contains (i.e. is equal to) specific column value(s)
- a particular column value of a row is not equal to another value
- a row has null value(s) in a specific column
- a row has non-null column values
- multiple conditions (combination of the above) need to be met
First, let’s create an example DataFrame that we’ll reference across this article in order to demonstrate a few concepts that will help us understand how to delete rows from pandas DataFrames.
import pandas as pd
df = pd.DataFrame({
'colA': [1, 2, 3, 4, None],
'colB': [True, True, False, False, True],
'colC': ['a', None, 'c', None, 'e'],
'colD': [0.1, None, None, None, 0.5],
})
print(df)
colA colB colC colD
0 1.0 True a 0.1
1 2.0 True None NaN
2 3.0 False c NaN
3 4.0 False None NaN
4 NaN True e 0.5
Delete row(s) containing specific column value(s)
If you want to delete rows based on the values of a specific column, you can do so by slicing the original DataFrame. For instance, in order to drop all the rows where the colA
is equal to 1.0
, you can do so as shown below:
df = df.drop(df.index[df['colA'] == 1.0])
print(df)
colA colB colC colD
1 2.0 True None NaN
2 3.0 False c NaN
3 4.0 False None NaN
4 NaN True e 0.5
Another approach is to reverse the condition so that you can keep all the rows where colA
is not equal to 1.0
. For example,
df = df[df.colA != 1.0]
Alternatively, if you want to drop rows where a column value is equal with any other value appearing in -say- a list of numbers, then you can make use of the isin()
method. For instance, in order to drop all the rows where the column value of colA
is 1.0
, 2.0
or 3.0
then the following will do the trick:
df = df.drop(df.index[df['colA'].isin([1.0, 2.0, 3.0])])
print(df)
colA colB colC colD
3 4.0 False None NaN
4 NaN True e 0.5
In the same way, you can revert the condition in order to keep only the records that don’t satisfy the condition. For instance,
df = df[~df.colA.isin([1.0, 2.0, 3.0])]
which is equivalent to the previous expression.
Delete row whose column value is not equal to another value
In the same way, you can simply drop rows whose column value is not equal to a specific value. For instance, the following expression will drop all records not having the colA
equal to 1.0
:
df = df.drop(df.index[df['colA']!=1.0])
print(df)
colA colB colC colD
0 1.0 True a 0.1
Delete rows with null values in a specific column
Now if you want to drop rows having null values in a specific column you can make use of the isnull()
method. For instance, in order to drop all the rows with null values in column colC
you can do the following:
df = df.drop(df.index[df['colC'].isnull()])
print(df)
colA colB colC colD
0 1.0 True a 0.1
2 3.0 False c NaN
4 NaN True e 0.5
Alternatively, you can reverse the condition and keep all non-null values:
df = df[df.colC.notnull()]
print(df)
colA colB colC colD
0 1.0 True a 0.1
2 3.0 False c NaN
4 NaN True e 0.5
Delete rows with non-null values in a specific column
On the other hand, if you want to drop rows having non-null values in a specific column, you can make use of the notnull()
method. For instance, to drop all rows with non-null values in column colC
you need to run
df = df.drop(df.index[df['colC'].notnull()])
print(df)
colA colB colC colD
1 2.0 True None NaN
3 4.0 False None NaN
Once again, you can reverse the condition and keep only null values by using isnull
method as shown below:
df = df[df.colC.isnull()]
Delete rows by combining multiple conditions
Now let’s assume that you want to drop rows by combining multiple conditions. For example, say you want to drop all the rows where both colC
and colD
are null. You can combine multiple expressions using the &
operator as shown below.
df = df.drop(df[df['colC'].isnull() & df['colD'].isnull()].index)
print(df)
colA colB colC colD
0 1.0 True a 0.1
2 3.0 False c NaN
4 NaN True e 0.5
If you want either of the conditions to be met, then you can use the |
operator. In this case, all the rows that have null values in either colC
or colD
will be dropped from the returned DataFrame.
df = df.drop(df[df['colC'].isnull() | df['colD'].isnull()].index)
print(df)
colA colB colC colD
0 1.0 True a 0.1
4 NaN True e 0.5
Final Thoughts
In today’s guide, we explored how to delete rows from pandas DataFrames based on specific conditions. Specifically, we discussed how to perform row deletion when specific column value(s) of rows are equal (or not equal) to other value(s). Additionally, we discussed how to drop rows containing null or non-null values in specific columns. Finally, we explored how to combine multiple conditions when deleting rows from DataFrames.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read.
You may also like
How To Select Rows From Pandas DataFrame Based on Column Values