Let’s be honest – your feature engineering skills probably aren’t at a level you want them to be, and you don’t know what to do about it. For a long period (longer than I’d like to admit) I’ve been using all kind of Pandas operations spread out on multiple rows, just to perform simple operations – like extracting a part of one column into another – when in reality most of the stuff can be done in one line of code.

List comprehensions are a way of doing so. If I was trying to sell you something, now would be the time to tell you how they are simple and intuitive, and they are (after spending a couple of days dangling with them), but for a beginner, they might seem intimidating.
This article will try to explain list comprehensions in as simple language as possible and show their use cases in Data Science. Now there’s no much point of me yapping around, so let’s dive straight in.
Introducing Titanic Dataset
I’m certain that you are aware of the famous Titanic dataset and probably used it sometime before. Even if you haven’t there’s nothing to worry about, just download it from this URL and import it in Python. Here’s what you should have:

Awesome.
Let’s start with something simple.
Usecase 1: Convert ‘Sex’ to 0 and 1
This is perhaps the most simple use case of list comprehensions. The idea is that you want to do the following conversion:
- male -> 0
- female -> 1
Now try to do this without list comprehensions. Isn’t that difficult, I know, but take a look at this nifty one-liner:

Now that was clean. But this most likely isn’t the reason you’re reading this article – you want to see something more advanced.
Usecase 2: Extract Titles from ‘Name’ column
Now, this is slightly more advanced. Do you see the Name column? Notice how it has this title (like Mr, Miss, etc) right after the first comma and it ends after the dot?
That’s simple enough to extract. First, you split the string at the comma, and grab everything to the right, then you split on the dot, and grab everything to the left:

Cool. I know, but you can take this even a step further. Let’s see how.
Usecase 3: Separate ‘Age’ into 3 Groups
Until recently I didn’t know you can use multiple if statements inside one list comprehension. Turn’s out that you can, but the deeper you go, the messier your code looks in the end.
The idea is to divide the Age column into 3 groups:
_below20 -> Age < 20
_20_to50 -> 20 < Age < 50
_above50 -> Age > 50
And this can be done in one line of code. Let’s see how:

First, you are making two groups (_below_20 and 20_to50), and then further dividing the second group into two groups (_20_to_50 and above50). This particular example isn’t complex, but it would get messy if you try to add another if statement to this list comprehension.
Before you leave
List comprehensions are awesome, that’s not for debate. You as a data scientist should utilize them whenever possible, to get more done in fewer lines of code.
Take some of your datasets and try to utilize those into the cleaning and preparation process. You’ll never look back.
I want to hear from you, how complex logic did you manage to put in a single line of code? Let me know.
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.