The world’s leading publication for data science, AI, and ML professionals.

Lambda Functions 101: From Zero to Hero

Learn how to use lambda functions with Pandas – code along

PYTHON

Image by Vlada Karpovich. Source: Pexels
Image by Vlada Karpovich. Source: Pexels

When I started learning Python, I remember how lost I felt when I got to the Lambda functions section. I tried reading articles and watching videos about it, but it took me a while to finally learn how to create complex Lambda functions for my Data Science projects. One of the biggest problems with articles on the internet is that they explain how it works, they show a simple example that is easy to understand, but in real life, it’s never as easy as they show us. For this reason, in this article, I will explain how to use lambda functions in your data science projects. I will explain how to build and read them.

What is a lambda function?

A Lambda function in Python programming is a function that has no name. It’s usually a small and restricted one-line function, and just like any other function, it can have multiple arguments with one expression. Are you still confused? Fear no more. Let’s apply it to a project, and it will be easier to understand how it works. I will use the famous Titanic dataset. We will do some data cleaning and data engineering using lambda. You can find the full notebook here.

First, let’s import all the packages we will need. I will use pyforest to import pandas, seaborn, and matplotlib with one line of code. I talked about pyforest in this blog. Then, I’m importing the dataset we will be using. You can find the dataset here. This project’s main objective is to create a machine learning model that predicts if a passenger survived the titanic or not. I will not be focusing on the models in this blog, but I will work on the dataset with this objective in mind.

import pyforest
train = pd.read_csv('train.csv')

Let’s check the dataset that we have and see what we can do. I used train.head(2) to check the top 2 rows.

It seems that we have some categorical and numerical values in our dataset. Let’s double-check that using train.dtypes so that we can see that datatypes

We can see that the Sex column is an object. For Machine Learning, we should use numerical values, and this is our first opportunity to use lambda. I will address 1 for males and 2 for females. That’s a straightforward task, and here is how we can do that.

train['Sex'] = train['Sex'].apply(lambda x: 1 if x == 'male' else 2)

Ok, what is going on here? Let’s go in parts. Instead of creating a new column, I will just address the numerical values to the existent Sex column. For this reason, I’m using train['Sex'] =. If I were going to create a new column, I should use the name of a column that isn’t in the dataset. Then, I’m writing the name of the column where the values that interest me are. In this case, it’s the same train['Sex'] column. So far, we have train['Sex'] = train['Sex']. The next step is using apply(), which allows the users to pass a function and apply it to every single value of the Pandas series.

Inside apply() we will write our lambda function, and here is where the fun lives. Let’s go in parts. First, let’s check the lambda x: part. Here, we are initiating lambda and denominating a variable that we will use to call every cell in a determined column. x represents every value in the Sex column. You could write any letter or word; it’s just a convention to use x. Now, let’s check the 1 if x == 'male' else 2 part. Here is what this part means: turn x into 1 if x is equal to the word ‘male’; if it’s not, turn x into 2. Can you see what is happening here? Let’s break this down one more time. Here is how each part should be read.

# Apply function to this columns
train['Sex'] =
# This is the columns where the values I'm looking for are
train['Sex'].apply()
# Calling lambda function
lambda x: 
# Turn value into 1 if value is equal to 'male'
1 if x == 'male' 
# If value is not equal to 'male', then turn value into 2
else 2

I could also write the following, and it would work the same way:

train['Sex'] = train['Sex'].apply(lambda x: 1 if x == 'male' else 2 if x == 'female')

You can read this example above as turn x into 1 if x is equal to ‘male.’ However, turn x into 2 if x is equal to ‘female.’ Does it make sense now? We can see a more complex example to make sure you can understand.

Let’s now keep our attention to the Fare column. I plotted it so that we can see the fare distribution. It might have a chance for some feature engineering.

Image created by the author
Image created by the author

We can clearly see that most of the tickets were less than £100 (I am assuming the ticket prices were in pounds). Most of them were less than £50, and very few were more than £100. There is a tiny number of tickets that cost more than £200 and even £500. We can create new values using these numbers. I will create categories for these tickets. The first one for those who paid more the £200. Another one for those who paid between £100 and £200, etc. I’m assuming that the staff did not pay anything, so I will create a category for those who paid £0. Here’s my lambda function:

train['fare_category'] = train['Fare'].apply(
                        lambda x: 1 if x >= 200 
                        else 2 if x >= 100 and x < 200
                        else 3 if x >= 50 and x < 100
                        else 4 if x < 50 and x > 0
                        else 5)

That’s a lot of information but don’t worry, let’s go in parts. I’m creating a new feature called fare_category using the values at the Fare column. Then, I started the function just like a did with the first example. Here’s how you can read everything:

# Turn x into 1 if x is equal or greater than 200
lambda x: 1 if x >= 200 
# Then, turn x into 2 if x is equal or greater than 100 and less than 200
else 2 if x >= 100 and x < 200 
# Then, turn x into 3 if x is equal or greater than 50 and less than 100
else 3 if x >= 50 and x < 100 
# Then, turn x into 50 if x is less than 50 and greater than 0
else 4 if x < 50 and x > 0 
# Then, turn x into 5 for all the other instances
else 5

I know, that’s a lot, but I hope this makes sense. This is as difficult as it can get. lambda functions were not made to be this long, and sometimes it’s just better to go with the regular functions. However, I feel that lambda functions can make your code cleaner and easy to read.

Final thoughts

If you got to this point, congratulations! You just took one step forward in becoming a great programmer. I really hope I was able to explain easily. The best way to learn it practicing. For this reason, I recommend you to play around with these examples that I gave you, and maybe you can create a little more. Let me know if I was able to help you.


Related Articles