Python Generators

A tutorial on developing python generator functions using the yield keyword

Anuradha Wickramarachchi
Towards Data Science

--

Image by Alexander Droeger from Pixabay

In simple terms, Python generators facilitate functionality to maintain persistent states. This enables incremental computations and iterations. Furthermore, generators can be used in place of arrays to save memory. This is because generators do not store the values, but rather the computation logic with the state of the function, similar to an unevaluated function instance ready to be fired.

Generator Expressions

Generator expressions can be used in place of array create operations. Unlike an array, the generator will generate numbers at runtime.

>>> import sys
>>> a = [x for x in range(1000000)]
>>> b = (x for x in range(1000000))
>>> sys.getsizeof(a)
8697472
>>> sys.getsizeof(b)
128
>>> a
[0, 1, ... 999999]
>>> b
<generator object <genexpr> at 0x1020de6d0>

We can see that in the above scenario, we are saving quite a lot of memory by having the generator in place of the array.

Functions with yield in place of return

Let us consider a simple example where you want to generate an arbitrary number of prime numbers. The following are the functions to check if a number is prime and the generator that will yield us infinitely many prime numbers.

def isPrime(n):
if n < 2 or n % 1 > 0:
return False
elif n == 2 or n == 3:
return True
for x in range(2, int(n**0.5) + 1):
if n % x == 0:
return False
return True
def getPrimes():
value = 0
while True:
if isPrime(value):
yield value
value += 1

As you can see in the second function we iterate in a while loop and yield the numbers that are prime. Let’s see how we can use the above generator.

primes = getPrimes()>>> next(primes)
2
>>> next(primes)
3
>>> next(primes)
5

First, we call the function and get the generator instance. Although this can emulate an infinite array, there is no element to be found yet. If you call list(primes), your program could crash with a MemoryError. However, for prime numbers, it will not go there since the prime number space is sparse for computations to reach the memory limit in a finite time. However, for generators, you will not know the length beforehand. If you call len(primes) you’ll get the following error for the very same reason that the numbers are only generated at the run time.

----------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-33-a6773446b45c> in <module>
----> 1 len(primes)

TypeError: object of type 'generator' has no len()

Generators with a Finite Number of Iterations

Although our prime number example has an infinite iteration space, in most day-to-day scenarios we face with finite computations. Therefore, let’s have a look at an example that we can use to read a file containing text data and the semantic score of the sentence in the next line.

Why do we need to use yield?

Imagine the file is 1TB and the corpus of words is 500000. It would not fit in memory. A simple solution is to read 2 lines at once, compute a dictionary of words per line and return it with the semantic score in the next line. The file would look like below.

The product is well packed
5
Edges of the packaging was damaged and print was faded.
3
Avoid this product. Never going to buy anything from ShopX.
1
Shipping took a very long time
2

It is a clear fact that we do not need to open the file at once. Furthermore, the lines must be vectorized and possibly saved into another file that can be parsed straightaway to train a machine learning model. So the option that will give us a clean code is to use a generator that will read two lines at once and give us the data and semantic score as a tuple.

Implementing a File Parsing Generator

Assume that we have the above text document in a file named test.txt. We will be using the following generator function to read the file.

def readData(path):
with open(path) as f:
sentiment = ""
line = ""
for n, d in enumerate(f):
if n % 2 == 0:
line = d.strip()
else:
sentiment = int(d.strip())
yield line, sentiment

We can use the above function in a for loop as follows.

>>> data = readData("test.txt")
>>> for l, s in data: print(l, s)
The product is well packed 5
Edges of the packaging was damaged and print was faded. 3
Avoid this product. Never going to buy anything from ShopX. 1
Shipping took a very long time 2

How the Generator Exits?

In a normal for loop, the iteration is stopped when no more yielding is performed by the generator. However, this can be observed by us manually calling next() on the generator instance. Calling the next() beyond the iteration limit will raise the following exception.

----------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-41-cddec6aa1599> in <module>
---> 28 print(next(data))
StopIteration:

Using send, throw and close

send function

Let's recall our prime numbers example. Imagine we want to reset our generator function’s value to 100 to start yielding values above 100 if they’re prime. We can use send() method on a generator instance to push a value into the generator as below.

>>> primes = getPrimes()
>>> next(primes)
2
>>> primes.send(10)
11
>>> primes.send(100)
101

Note that we must call next() at least once before we call send(). Let’s see how we must modify our function to fit the purpose. Because the function should know how to assign the received value.

def getPrimes():
value = 0
while True:
if isPrime(value):
i = yield value
if i is not None:
value = i
value += 1

We store the yielded value in the variable i. If that’s not None type, we assign it to the value variable. None check is essential as the first next() will have no value in value variable to yield.

throw function

Imagine you want to end the iteration at a value above 10 to avoid overflow or timeouts (hypothetically). The throw() function can be used to prompt the generator to halt raising an exception.

primes = getPrimes()for x in primes:
if x > 10:
primes.throw(ValueError, "Too large")
print(x)

This technique is useful to validate inputs. The logic lies upon the user of the generator. This will result in the following output.

2
3
5
7
----------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-113-37adca265503> in <module>
12 for x in primes:
13 if x > 10:
---> 14 primes.throw(ValueError, "Too large")
15 print(x)

<ipython-input-113-37adca265503> in getPrimes()
3 while True:
4 if isPrime(value):
----> 5 i = yield value
6 if i is not None:
7 value = i

ValueError: Too large

close function

It is often elegant to handle the closure without an exception in hand. In such scenarios, theclose() function can be used to effectively close the iterator.

primes = getPrimes()for x in primes:
if x > 10:
primes.close()
print(x)

This will give us the following output.

2
3
5
7
11

Note that we have value 11 which is the last computed value greater than 11. This simulates the behaviour of a do while loop in C/C++.

I believe this article would help you to make better software and research programs in future. Thanks for reading.

Cheers 😃

--

--