The world’s leading publication for data science, AI, and ML professionals.

Better Pythoning 2: List Comprehensions (including NLP examples with spaCy)

How to shorten lines of code and make it more readable in a few simple steps

If you’re a beginner in Data Science, then you’ve likely come across List Comprehensions (cool one-line for loops). Have you wondered what these are? Why they are so ubiquitous and when they should be used?

In this article, I’m going to briefly explain List Comprehensions with examples to make them digestible. I’ll be specifically highlighting how they differ with for-loops and whether they are more efficient or not.

This is the second of a four-part series on using one-line statements to shorten code. The previous article in this series looked at Ternary Operators (cool one-line if statements). This article will build upon the previous one to show you how, together, List Comprehensions and Ternary Operators can make your code much shorter and more readable.

I hope you enjoy reading the article, learn something new and improve your Coding skills.


List Comprehensions – fancy for-loops, kind of?

In Python, a for loop allows you to loop through an iterable to perform repetitive tasks.

The classic example is printing the numbers 1 to 10:

Here, the function range is an iterable object that creates numbers from 0 to num - 1 (therefore, a total of 10 numbers in this example). We can recreate these using list comprehensions:

The above statement reads, print(i+1) for each i in range(num) . We’ve essentially repeated the same task using one less line, which doesn’t appear to save much space. So are these even useful?

Of course.

If we are sharp eyed, we notice that we’ve actually created a list . We can equally set this to be a variable, for example:

numbers = [i + 1 for i in range(num)]

This is quite easy to read: create an element in the list with value i+1 for every i in range(num) . If we print numbers , we’ll get:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You may be wondering what the print example created? Since we ran print 10 times, we created a list of None values (since print returns None ):

[None, None, None, None, None, None, None, None, None, None]

This makes our List Comprehensions quite powerful for setting variables, since we are effectively creating variables and storing them in a list as the iterable runs. In contrast, a for loop doesn’t create anything on its own, but rather it’s the things that we pass into the for loop that do this.

Comparing this with how many lines required to create a list using for loops:

Aside from the fact that list comprehensions are shorter, it’s worth noting that they are much more readable.

It’s worth noting very briefly that list comprehensions can be used for creating sets and dicts as well. For example:


List Comprehensions and Ternary Operators

List Comprehensions can unlock a new level of readability when you use them alongside Ternary Operators.

Example 1: Suppose you are a teacher with a dictionary of student names and their exam scores, and you wanted to extract from that, the names of students who are not doing so well so that you can provide them with more support.

You can do this with for loops, but using List Comprehensions is much easier (and more readable..):

The list comprehension reads: save student for every student and score in student_scores.items() where score is less than or equal to the failing_threshold .

It’s worth noting that List Comprehensions in this context are only useful for reducing data. It would not be possible, for example, to store both the failing students and the passing ones in two different lists. For that you’d use a for loop.

Another example where Ternary Operators are useful is if you want to treat the components of the List differently subject to a condition.

Example 2: Suppose you had the same dataset with students and their scores, but you wanted to create a target variable, ‘fail’ that gives a 1 if the student has failed, and a 0 if a student has passed

You can do it like this:

Note, this is actually a trivial example, since the if statements and ternary operators can all be replaced by int(score <= fail_threshold) .

However, we notice that List Comprehensions can become fairly complicated (and long) when we use them with ternary operators. This is something to keep in mind when trying to make your code more readable.


Two dimensional List Comprehensions?

List Comprehensions can be chained, which allows them to go beyond 1 dimension. Let me first explain an example where you might need this.

I recently ran into a problem where I needed to flatten a list of lists, so for example:

list_of_lists = [[1,2], [3,4], [5,6]]

And I wanted to convert that into list_of_lists = [1,2,3,4,5,6]

My first instinct was to use the unpack operator, * , but this is not supported in list comprehensions. My logic was:

Instead, I found that I needed to essentially chain two List Comprehensions to be able to flatten it:

The confusing but is that the logic doesn’t follow what you might expect: my intuition was num for num in list_ for list_ in list_of_lists but Python requires that the list order be reversed.

It’s also possible to add ternary operators here, acting on List Comprehension separately, but as you can imagine that makes the readability of the code quite bad, and thus I personally avoid getting to a point like this. I’ve just added it for completeness sake.


Applied real life example with spaCy

I’m going to give you a real life Data Science example of how useful List Comprehensions can be, in particular with NLP.

I used spacy to create Doc objects of text that I had. This means that each token is actually an object that contains parameters that explain what that token is (i.e. if it’s a noun, a verb, a number, etc…).

It’s great that spacy does all of these annotations, because the cleaning process becomes quite easy on my end (and also very readable…):


Conclusion

List Comprehensions are essentially a way of creating lists from iterables. They provide a simple way of summarising a lot of code in 1 line, making code very readable.

Key takeaways:

  • a List Comprehension is a one-line for loop that is used to instantiate new lists from iterables. Unlike for loops, they return values
  • They are very useful for creating new lists from already existing ones subject to constraints when used with Ternary Operators
  • They are more efficient (see Appendix)
  • They can be chained to work along higher dimensions, but this loses readability
  • They can be used for dicts, sets and lists
  • As always, more complex cases are best reserved for for loops and if-statements

Overall, I find these Comprehensions very elegant, and try to use them for the use cases described above. As you saw, List Comprehensions really shine when used with Ternary Operators, however, they can also be compounded with Lambda functions (cool one-line functions) that can make your code much more compact. This will be the subject of the next article in the series, a brief sneak peak:


Appendix:

Like with my previous article, I wanted to experiment to see if List Comprehensions are faster than using for loops. The experiment for Ternary Operators showed that they are marginally less efficient than for loops.

For this, I ran the experiment with student names and score on a dict containing 1000 student names. I repeated that 10000 times. It’s worth noting that in this context, you must use an if statement with the for loop case, and ternary operators with the list comprehension case.

We can see here that List Comprehensions are actually quite faster ~ 30%, which gives us even more reason to use them.


Related Articles