The world’s leading publication for data science, AI, and ML professionals.

itertools and functools : Two Python Lone Soldiers.

One cool thing about Python is its support for functional programming which states that processing steps are done through functions.

Photo by Markus Spiske on Unsplash
Photo by Markus Spiske on Unsplash

Luckily, Python comes with full-fledged packages, thus enhancing its multi-paradigm winning card, some of which (if not totally) are originally implemented in C. You can actually read the implementations in the Lib / Modules folders in the CPython repo of the Github Python project.

In this article, I will talk about two functional Programming modules that Python offer us to perform a variety of functional-based computations: itertools and functools .

I will not be reproducing the same pattern you find in the official documentation. I just wanted to mention some of the functions that you do not come across very often on the Internet, but somehow managed to be there to save me when I most needed it .

With all the gratitude I feel towards them, Let us begin.

1. Itertools

Simply put, itertools allows for efficient looping.

The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an "iterator algebra" making it possible to construct specialized tools succinctly and efficiently in pure Python.

1.1. cycle

I remember I saw something similar with Haskell. There was most evidently a need to implement a similar building block with Python. Whenever you want to operate infinite cyclic looping that would stop with an exception or a specified condition, this would be your way to go. Simple and effective :

>> from itertools import cycle
>> for el in cycle(range(10)): print(el, end="")
>> 0123456789012345 ...
>> ..

Eventually, if you run your loop without interrupting it, something will occur to your memory buffer. I don’t know, I haven’t tried it.

1.2. accumulate

In case you want to design some sort of accumulators with as few lines of code as possible:

>> from itertools import accumulate
>> list(accumulate(range(10)))
>> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
>> ...

Basically, every element in the accumulator will be equal to the sum of all previous elements including itself.

Personally it would proudly take me at least 6 lines to implement with pure builtin functions. However, since the need to have a proper accumulator is present in almost every dataframe aggregation, you may like how this plays out.

1.3. groupby

I actually had a lot of fun using it somehow in a few of my projects. Here’s an example:

>>> from itertools import groupby
>>> data = 'AAAABBBCCDAABBB'
>>> for k, v in groupby(data):
...   print(k, len(list(v)))
... 
A 4
B 3
C 2
D 1
A 2
B 3

It roughly operates similarly to the numpy.unique function with more capabilities. Every time the value of the key function changes, it creates a break or a new group. This is in contrast to SQL‘s GROUP BY, which groups similar data irrespective of the order. So it is important to sort the data in the first place before passing it to itertools.groupby .

1.4. pairwise

When you’re looping over an iterable, you are barely interested in processing each element by itself. Python 3.10 comes with a new feature for more interesting use cases. Here’s an example with pairwise :

>>> from itertools import pairwise
>>> list(pairwise('ABCDEFG'))
... ['AB','BC','CD','DE','EF','FG']

Having an extra degree of performing your iteration is always a privilege and a luxury. Iterating your list by successive chunks of two elements apiece allows for more incremental or decremental processing.

1.5. starmap

If you ever used map to operate on an iterable, you can think of starmap as a map operator that is branching out into small iterables. Let us work on a small example :

>>> from itertools import starmap
>>> v = starmap(sum, [[range(5)], [range(5, 10)], [range(10, 15)]])
>>> list(v)
[10, 35, 60]
>>> ...

In this example, sum applies to each set of the arguments and produces a list of results with the same number. It is important to see each iterable as an argument and not as a plain input, that’s why each iterable was placed between brackets otherwise an error would be raised.

Anytime you want to have a function that should take arguments depending on separate groups of data, starmap may be very efficient and quite expressive.

1.6. zip_longest

zip is used also in Haskell. It is quite useful for parallel iterations. However, it expects the same length from each of its iterable arguments. If you tried zipping over two iterables with different lengths, zip will chop off the exceeding part of the longer one. zip_longest fills the shorter iterable up to the same length as the longer one:

>>> from itertools import zip_longest
>>> list(zip_longest('ABCD', 'xy', fillvalue='-')) 
... [('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]

That’s practical when you initially have a length mismatch. You can then comfortably loop over your two-or-more iterables and be sure that this will hold in runtime.

1.7. product

Say you have a matrix waiting to be updated and you must loop through the rows first and then the columns (or vice versa). You would do :

for i in ncolumns: 
    for j in nrows :
        ...

There is a cooler way to compound both loops with a feature in itertools called product:

>>> from itertools import product
>>> for i,j in product([0,1], [10,11]):
...   print(i,j)
... 
0 10
0 11
1 10
1 11

product performs cartesian products along the iterables and in the case of the matrix you may replace nested loops like so :

>>> for i, j in product(ncolumns, nrows):
    ...

Little downside, this erases any possibility to initialize intermediate or temporary variables between the two loops. At the end of the day, it depends on what you want to achieve.


This is probably about every piece of itertools I can think of that makes my day easier and cooler. I won’t elaborate more on that as the library never runs out of surprises, especially the itertools recipes : They were cooked by the authors to make the best out of the previously viewed functions that act as vectorized and highly optimized building blocks. In the recipes you actually see to what extent these functions have the potential to create more powerful tools with the same high performance as the underlying toolset. Something quite interesting is going on in there, you’d better check it right away…

2.functools

In the simplest terms, functools is a module for higher-order functions: They wield functions as arguments and output. Decorators, properties, caches are examples of higher-order functions that are placed as annotations above the function definitions. Let us see some other examples.

2.1. partial

So you need to implement a function that will be considered as a first-class citizen in your program. Problem is, it takes too many arguments and causes an exception when called because of that. partial is a feature from functools that allows to freeze a portion of the arguments by assigning single values to at least on argument.

Let us consider the next set of arrays and weights:

>>> import numpy as np
>>> x1 = np.array([1,2,1])
>>> w1 = np.array([.2, .3, .2])
>>> n1 = 3
>>> 
>>> x2 = np.array([2,1,2])
>>> w2 = np.array([.1, .1, .05])
>>> n2 = 3
>>>

Let us then consider this function which computes the weighted means of those arrays :

>>> def weighted_means(x1, w1, n1, x2, w2, n2): 
...   return np.dot(x1,w1)/n1 , np.dot(x2,w2)/n2

We put the function into action :

>>> weighted_means(x1, w1, n1, x2, w2, n2)
... (0.3333333333333333, 0.13333333333333333)

Suppose you want to reduce the number of variable arguments by freezing x2, w2 and n2 by doing so :

>>> from functools import partial
>>> reduced_weighted_means = partial(weighted_means, x2 = x2 , w2 = w2 , n2 = n2)

You then use the new reduced function with a reduced number of arguments :

>>> reduced_weighted_means(x1, w1, n1)
... (0.3333333333333333, 0.13333333333333333)

Notice it is the same result as before, indicating that the function works as if the fixed arguments were statically typed in. This seems like a perfect workaround when the function is used under specific constraints related to the multiplicity of arguments.

2.2. partialmethod

partialmethod is an extrapolation of partial except it is used as a class method. Let us illustrate this with an example (a very silly one I admit):

Let us test it :

>>> mn = ModelName('clustering')
>>> mn.set_cls()
>>> print(mn.algorithm)
>>> 'classification'
>>>
>>> mn.set_regr()
>>> print(mn.algorithm)
>>> 'regression'

The class serves to store a string and delegates the setter role to the two partialmethod’s. set_cls and set_regr work as if they were properly defined methods and set algorithm each to a different value. A small heads-up: there should be no algorithm.setter after defining the algorithm property.

2.3. singledispatch

Suppose you define a simple function that executes some instructions and then decide to make it more generic:

Then, you overload the function with additional implementations by using the register() attribute to the zen_of_python which is now used as a decorator.

You determine different implementations based on the type of the argument :

Notice how the generic function behaves when feeding on different types of arguments :

>>> zen_of_python('hello')
... Beautiful is better than ugly.
>>> zen_of_python(1)
... There should be one-- and preferably only one --obvious way to do it.
>>> zen_of_python(1.0)
... Readability counts.
>>> zen_of_python([1, 2, "a"])
... Namespaces are one honking great idea -- let's do more of those!

The results are inline with our implementations. However, since the generic function does not know yet the adequate implementation for dict type, it will jump to the default implementation :

>>> zen_of_python(dict())
... Beautiful is better than ugly.

This stuff is cool. I know.

2.4. singledispatchmethod

Our last guest: singledispatchmethod which is usable inside a class definition. An example maybe? Suppose you have a class that outputs a string based on the type of argument you input:

Off with a small demo:

>>> gen = Generic()
>>> gen.generic_method(1)
...'First case'
>>> gen.generic_method(1.0)
>>> 'Second case'
>>> gen.generic_method([1,2,3])
>>> 'Third case'

Let us try a dict type :

>>> gen.generic_method(dict(a=1, b=2))
Traceback (most recent call last):
    ...
  NotImplementedError: Never heard of this type ..

Should you put another type than those declared on the Generic class, you will encounter an error. If you want it gone, you should add a little and gentle function that takes care of it.


References

Official Docs – itertools and functools


Related Articles