The world’s leading publication for data science, AI, and ML professionals.

5 Ways to Apply If-Else Conditional Statements in Pandas

Revisiting Pandas Basics and Honing Your Data Wrangling Skills

Image by muxin alkayis from Pixabay
Image by muxin alkayis from Pixabay

Creating a new column or modifying an existing column in a Pandas data frame – based on a set of if-else conditions – is probably one of the most frequently encountered problems among all different types of data wrangling tasks. In this post, I’d like to share with you my notepad which summarizes the 5 popular ways of applying if-else conditional statements in Pandas dataframes with handy and practical code snippets. For simplicity, I created a small sample dataset and will use it for demonstration purposes throughout the tutorial.

Let’s say we have a pandas dataframe as shown below. The column ‘visits_30days’ shows how many times a customer visited a website in the last 30 days. We want to create a new column that either categorizes these customers into ‘non-visitor’ or ‘visitor’ (a binary categorization) or categorizes them into multiple buckets such as ‘0 visits’, ‘1–5 visits’, ‘6-10 visits’, etc. We’ll name this new column ‘visits_category’.

Image by Author
Image by Author

Method 1: Use the numpy.where() function

The numpy.where() function is an elegant and efficient Python function that you can use to add a new column based on ‘true’ or ‘false’ binary conditions. The syntax looks like this:

np.where(condition, value if condition is true, value if condition is false)

Applying the syntax to our dataframe, our code would look like this. The new column ‘visits_category’ has the value of either ‘Yes’ or ‘No’ depending on the condition of whether the value of the ‘visits_30days’ column is equal to 0 or not.

import pandas as pd
import numpy as np
df['visits_category'] = np.where(df['visits_30days']== 0, 'YES', 'NO')
Image by Author
Image by Author

Method 2: Use the lambda function

Like np.where() , the lambda function is another superb choice when you need to add a column based on a simple binary if-else condition. The generic structure of the code using lambda function is:

df['new column name'] = df['column name'].apply(lambda x: 'value if condition is true' if x condition else 'value if condition is false')

For our specific example, the code would look like this:

df['visits_category'] = df['visits_30days'].apply(lambda x: 'YES' if x == 0 else 'NO')

Method 3: Use the numpy.select() function

Now let’s move on to a more complicated scenario. Let’s say we want to create multiple categories for ‘visits_category’ involving multiple conditions. You won’t be able to achieve it by using np.where(). Instead, you can use np.select() to apply more complicated conditions.

First, create a python list that specifies your conditions in the order of which one needs to be executed first, second, etc.

#Create a python list to specify all the conditions
conditions = [
    (df['visits_30days'] == 0),
    (df['visits_30days'] > 0) &amp; (df['visits_30days'] <= 5),
    (df['visits_30days'] > 5) &amp; (df['visits_30days'] <= 10),
    (df['visits_30days'] > 10)
    ]

Then, create a python list of values that we want to assign to each condition. Make sure each value corresponds to the condition created in the first step.

#create a python list of values to be assigned to the conditions
values = ['0 visits', '1-5 visits', '6-10 visits', '>10 visits']

Finally, create the new column using np.select() by using the two lists you created before as the arguments.

# use np.select() to create a new column
df['visits_category'] = np.select(conditions, values)

Putting the three pieces of code together you get the following:

df['visits_category'] = np.select([
    (df['visits_30days'] == 0),
    (df['visits_30days'] > 0) &amp; (df['visits_30days'] <= 5),
    (df['visits_30days'] > 5) &amp; (df['visits_30days'] <= 10),
    (df['visits_30days'] > 10)
    ], ['0 visits', '1-5 visits', '6-10 visits', '>10 visits'])
Image by Author
Image by Author

Method 4: Use the Pandas apply() function

With this method, we can first define a function that specifies the conditions. We then apply that function along columns (axis=1).

def conditions(row):
    if row['visits_30days'] == 0:
        val = '0 visits'
    elif row['visits_30days'] >0 &amp; row['visits_30days']<=5:
        val = '1-5 visits'
    elif row['visits_30days'] >5 &amp; row['visits_30days']<=10:
        val = '5-10 visits'  
    elif row['visits_30days'] >10:
        val = '>10 visits'  
    else:
        val = 'NA'
    return val

#Apply the function to each data point in the data frame
df['visits_category']= df.apply(conditions, axis=1)

Method 5: Use DataFrame.loc()

Pandas DataFrame.loc() selects rows and columns by label(s) in a given DataFrame. For example, in the code below, the first line of code selects the rows in the dataframe where the value of ‘visits_30days’ is equal to zero and assigns ‘0 visits’ to the new column ‘visits_category’ for only those rows that meet this specific condition. You then do the same thing for other conditions as well.

df.loc[(df['visits_30days'] == 0), 'visits_category'] = '0 visits' 
df.loc[(df['visits_30days'] > 0) &amp; (df['visits_30days'] <= 5), 'visits_category'] = '1-5 visits' 
df.loc[(df['visits_30days'] > 5) &amp; (df['visits_30days'] <= 10), 'visits_category'] = '5-10 visits'  
df.loc[(df['visits_30days'] > 10) , 'visits_category'] = '>10 visits'  

Learning and summarizing the most common pandas’ data wrangling techniques has always been a fun and helpful exercise in my daily data scientist life. Data wrangling is an important element in a data scientist’s day-to-day work and accounts for nearly 80% of the time spent on a data analytics project. Having a notepad nearby with all your favorite code snippets sorted and summarized is a great and efficient way to improve your productivity. I hope you enjoyed this tutorial and thanks for reading.


Data Source: The sample dataset used in this tutorial was created by the author for demonstration purposes.

You can unlock full access to my writing and the rest of Medium by signing up for Medium membership ($5 per month) through this referral link. By signing up through this link, I will receive a portion of your membership fee at no additional cost to you. Thank you!


Related Articles