There are lots of great Python libraries, but most of them don’t come close to what built-in itertools
and also more-itertools
provide. These two libraries are really the whole kitchen sink when it comes to processing/iterating over some data in Python. At first glance however, functions in those libraries might not seem that useful, so let’s make little tour of (in my opinion) the most interesting ones, including examples how to get the most out of them!
Compress
You have quite a few options when it comes to filtering sequences, one of them is compress
, which takes iterable and boolean selector and outputs items of the iterable where the corresponding element in the selector is True
.
We can use this to apply result of filtering of one sequence to another, like in the above example, where we create list of dates where the corresponding count is greater than 3.
Accumulate
As name suggests – we will use this function to accumulate results of some (binary) function. Example of this can be running maximum or factorial:
If you don’t care about intermediate results, you could use functools.reduce
(called fold
in other languages), which keeps only final value and is also more memory efficient.
Cycle
This function takes iterable and creates infinite cycle from it. This can be useful for example in a game, where players take turns. Another cool thing you can do with cycle
is to create simple infinite spinner:
Tee
Final one from itertools
module is tee
, this function creates multiple iterators from one, which allows us to remember what happened. Example of that is pairwise
function from itertools recipes (and also more_itertools
), which returns pairs of values from input iterable (current value and previous one):
This function is handy every time you need multiple separate pointers to the same stream of data. Be careful when using it though, as it can be pretty costly when it comes to memory. Also important to note is that you should not use an original iterable after you use tee
on it as it will mess up (unintentionally advance) those new tee
objects.
more_itertools
Now, let’s have a closer look at what the more_itertools
library offers, as there are many interesting functions, that you might not have heard about.
Divide
First up from more_itertools
is divide
. As the name suggests, it divides iterable into number of sub-iterables. As you can see in example below, the length of the sub-iterables might not be the same, as it depends on number of elements being divided and number of sub-iterables.
Partition
With this function, we will be also dividing our iterable, this time however, using a predicate:
In the first example above, we are splitting list of dates into recent ones and old ones, using simple lambda function. For the second example we are partitioning files based on their extension, again using lambda function which splits file name into name and extension and checks whether the extension is in list of allowed ones.
Consecutive_groups
If you need to find runs of consecutive numbers, dates, letters, booleans or any other orderable objects, then you might find consecutive_groups
handy:
In this example, we have a list of dates, where some of them are consecutive. To be able to pass these dates to consecutive_groups
function, we first have to convert them to ordinal numbers. Then using list comprehension we iterate over groups of consecutive ordinal dates created by consecutive_groups
and convert them back to datetime.datetime
using map
and fromordinal
functions.
Side_effect
Let’s say you need to cause side-effect when iterating over list of items. This side-effect could be e.g. writing logs, writing to file or like in the example below counting number of events that occurred:
We declare a simple function that will increment a counter every time it’s invoked. This function is then passed to side_effect
along with non-specific iterable called events
. Later when the event iterator is consumed, it will call increment_num_events
for each item, giving us final events count.
Collapse
This is a more powerful version of another more_itertools
function called flatten
. collapse
allows you to flatten multiple levels of nesting. It also allows you to specify base type, so that you can stop flattening with one layer of lists/tuples remaining. One use-case for this function would be flattening of Pandas DataFrame
. Here are little more general purpose examples:
First one generates list of files and directory paths by collapsing iterables returned by os.walk
. In the second one we take tree data structure in a form of nested lists and collapse it to get flat list of all nodes of said tree.
Split_at
Back to splitting data. split_at
function splits iterable into lists based on predicate. This works like basic split
for strings, but here we have iterable instead of string and predicate function instead of delimiter:
Above, we simulate text file using list of lines. This "text file" contains lines with -------------
, which is used as delimiter. So, that’s what we use as our predicate for splitting these lines into separate lists.
Bucket
If you need to split your iterable into multiple buckets based on some condition, then bucket
is what you are looking for. It creates child iterables by splitting input iterable using key function:
Here we show how to bucket iterable based on items type. We first declare a few types of shapes and create a list of them. When we call bucket
on this list with the above key function, we create a bucket object. This object supports lookup like built-in Python dict
. Also, as you can see, each item in the whole bucket object is a generator, therefore we need to call list
on it to actually get the values out of it.
Map_reduce
Probably the most interesting function in this library for all the Data Science people out there – the map_reduce
. I’m not going to go into detail on how MapReduce works as that is not purpose of this article and there’s lots of articles about that already. What I’m gonna show you though, is how to use it:
This MapReduce implementation allows us to specify 3 functions: key function (for categorizing), value function (for transforming) and finally reduce function (for reducing). Some of these function can be omitted to produce intermediate steps in MapReduce process, as shown above.
Sort_together
If you work with spreadsheets of data, chances are, that you need to sort it by some column. This is a simple task for sort_together
. It allows us to specify by which column(s) to sort the data:
Input to the function is list of iterables (columns) and key_list
which is tells sort_together
which of the iterables to use for sorting and with what priority. In case of the above example with first sort the "table" by Date of Birth and then by Updated At column.
Seekable
We all love iterators, but you should always be careful with them in Python as one of their features is that they consume the supplied iterable. They don’t have to though, thanks to seekable
:
seekable
is a function that wraps iterable in an object that makes it possible to go back and forth through an iterator, even after some elements were consumed. In the example you can see that we’ve got StopIteration
exception after going through the whole iterator, but we can seek back and keep working with it.
Filter_except
Let’s look at following scenario: You received mixed data, that contains both text and numbers and all of it is in string form. You, however, want to work only with numbers (floats/ints):
filter_except
filters items of input iterable, by passing elements of iterable to provided function (float
) and checking whether it throws error (TypeError, ValueError
) or not, keeping only elements that passed the check.
Unique_to_each
unique_to_each
is one of the more obscure functions in more_itertools
library. It takes bunch of iterables and returns elements from each of them, that aren’t in the other ones. It’s better to look at example:

Here, we define graph data structure using adjacency list (actually dict
). We then pass neighbours of each node as a set to unique_to_each
. What it outputs is a list of nodes that would get isolated if respective node was to be removed.
Numeric_range
This one has a lot of use cases, as it’s quite common that you would need to iterate over a range of some non-integer values:
What is nice about numeric_range
is that it behaves the same way as basic range
. You can specify start
, stop
and step
arguments as in examples above, where we first use decimals between 1.7
and 3.5
with step of 0.3
and then dates between 2020/2/10
and 2020/2/15
with step of 2 days.
Make_decorator
Last but not least, make_decorator
enables us to use other itertools as decorators and therefore modify outputs of other functions, producing iterators:
This example takes map_except
function and creates decorator out of it. This decorator will consume the result of the decorated function as its second argument (result_index=1
). In our case, the decorated function is read_file
, which simulates reading data of some file and outputs a list of strings that might or might not be floats. The output however, is first passed to decorator, which maps and filters all the undesirable items, leaving us with only floats.
Conclusion
I hope that you learned something new in this article, as itertools
and more_itertools
can make your life a whole lot easier if you are processing lots of data frequently. Using these libraries and functions however, requires some practice to be efficient with. So, if you think that you can make use of some of the things shown in this article, then go ahead and checkout itertools recipes or just force yourself to use these as much as possible to get comfortable with it. 😉
This article was originally posted at martinheinz.dev