In the past, more than today programming meant to always keep in mind memory restrictions. While having 32 MB of RAM in the year 1990 was a fortune, nowadays sometimes even 32 GB on a home computer is not enough. While hard disks, RAM, and GPU memories grew in size, so did the amount of data available. Therefore, it is still relevant to have a repertoire of memory-efficient Algorithms.
Two Small Examples
A classic example is an Internet switch that monitors different IPs sending packages to each other. A common task for a switch is to find out the heavy hitters, i.e. a pair of two IP addresses where IP₁ communicates extremely often to IP₂, compared to the other pairs. This is interesting since this might be an indicator of a Denial-of-Service attack.

This sounds like an easy problem, right? Just implement some mapping from IP pairs (IP₁, IP₂) to the number of communications originating from IP₁ to IP₂. In Python, this could be a dictionary, an instance of the Counter class or an adjacency matrix. Then you can search for the highest k counts in the data structure and output the corresponding IP addresses.

But now think about the size of these data structures. A bigger switch can receive requests from millions of IPs and routes them to as many other IPs.
This means that we might end up with counts for millions times millions of IP pairs.
This, paired with the usually low storage of a switch is fuel for a lot of trouble. It’s just impossible with this approach: we need an algorithm using less memory than storing everything. One way is using a Count-min Sketch. You can also search for "heavy hitters" on YouTube and find some nice explanations and examples if you are interested.
Now I hear some of you say:
Why would I care about network stuff? I’m a Machine Learning guy, duh! – you
Well, there is another famous example from the Machine Learning world: Gradient Descent!
If we deal with a small enough data set, it can fit into the (GPU) RAM completely, and we can use Batch Gradient Descent, i.e. put the complete data in the memory at once and process it. However, most of the time our working memory is too small, making it necessary to use the Stochastic Gradient Descent or the Mini-Batch Gradient Descent, which are examples of so-called Streaming Algorithms. Another example is the Hoeffding Tree Algorithm, which I described here.
In this article, I want to show you a few examples of Streaming Algorithms, including Python implementations that you can use! Apart from making you aware of the problem, which I have already done. 😉
Intuition
With Streaming Algorithms, I refer to algorithms that are able to process an extremely large, maybe even unbounded, data set and compute some desired output using only a constant amount of RAM.
If the data set is unbounded, we call it a data stream. In this case, if we stop processing the data stream at some position n, we expect our algorithm to have a solution corresponding to the data seen up to this point.
In the following, just imagine that we either have an enormous data set on our hard disk that we want to process without loading it into our RAM at once (because we can’t) or that there is a source that outputs a data stream, for example, incoming tweets on Twitter. Both cases are handled the same way.
I will phrase the upcoming examples in the language of large data sets since then we know that they are finite, and I don’t have to mention all the time that we stop reading a data stream.
We further assume that we can pass over the data exactly once.
A Gentle Start
Let us get familiar with how we can design Streaming Algorithms using two simple examples.
Finding the Minimum
Imagine that there is an extremely large list of numbers, too large for your RAM. You want to find out the minimum of this list. In Python, classically you solve it like this:
print(min(my_list))
But this assumes that my_list
is in the RAM already. So, how can we approach this in another way? Maybe you have found a solution already:
Just read the data set number after number and update the minimum, whenever you find a smaller number.
To be more precise: You read the first element and declare it the minimum. Then you read the second element, and if it is smaller than the current minimum (first element), declare it the minimum. Else, do nothing.
Then you read the third element, and if it is smaller than the current minimum, declare it the minimum. Else, do nothing.
Then you read the fourth element, and if it is smaller than the current minimum, declare it the minimum. Else, do nothing.
Ok, I stop it, you know where this is going. This basically works, because

Using this formula, you can easily show via induction that the algorithm is correct. After reading the first element, the result is correct since _a_₁<∞ and hence min(∞, _a_₁)=_a_₁. The induction step is exactly the formula (think about it!). But enough of this, let us get back on track.
In Python, we can solve it using the following very simple class:
from math import inf
class StreamingMinimum:
def __init__(self):
self.result = inf # Immediately replaced by the 1st element
def update(self, element):
self.result = min(self.result, element)
You can use it via
import numpy as np
stream = iter(np.random.randn(10000)) # Simulate a stream
s = StreamingMinimum()
for element in stream:
s.update(element)
print(s.result)
Easy, right? Let’s increase the difficulty a bit.
Finding the Mean
The same setting: Big Data set, but now we want to find the mean instead of the minimum.

So, our problem now is the following:
How to compute the mean of n+1 elements when we already have the mean of the first n elements?
An easy solution is using the following identity that you would have probably come up with after thinking a little bit:

We can see that we don’t only have to store the old mean, but we also have to keep track of the number of elements n, since this is needed in the formula. In the case of computing the minimum, this was not necessary.
Here is the class:
class StreamingMean:
def __init__(self):
self.result = 0
self.n = 0
def update(self, element):
self.result = (self.result * self.n + element) / (self.n+1)
self.n += 1
You can test this code again as before. Just use the StreamingMean
class instead of StreamingMinimum
.
Sanity check: the result is around 0, what we can also expect with standard normally distributed random variables. And again: checking the correctness of this algorithm is an easy induction exercise.
Since you know how it works now, let’s get to a more interesting algorithm.
Reservoir Sampling
Imagine that you have a large dataset and you want to uniformly sample an object. How could you do this? Well, if you know the size n of the data set, you can uniformly draw a random number k between 1 and n, scan the data set and take the k-th element.
But now imagine that you have a data stream and you have no clue in advance how many elements come in or when you want to stop reading the stream. Now it is getting difficult since you don’t know which range to draw your random index from. But this problem also has an easy solution, called Reservoir Sampling.
The idea is the following: You have a single box (the reservoir) for elements. When scanning the data stream, replace the content of the box with the current element with a certain probability.

If I leave you alone with this idea, probably you could figure out the probability after some time. The goal after passing over n elements is to be able to have each element in the box with a probability of 1/n.
In the easiest case, start with some constant probability p. But, for example, the probability of the first element being in the box after n-1 more elements is only (1-p)ⁿ, which is exponentially small for any p < 1 and not what we search for.
One idea to fix this: We have to decrease the probability of a swap the longer we scan the sequence. What is a simple decay rate? How about 1/n? Let’s try.
At first, the box is empty. We scan the first element and fill the box with the first element (with probability 1/1=1). So far so good.
Now we get to the second element. We replace the content of the box with a probability of 1/2. After this step, the first element is in the box with a probability of 1/1 * 1/2 = 1/2, and the second element is inside the box with a probability of 1/2. Perfect! Let’s do another one.
The third element is reached and it replaces the element within the box with probability 1/3. What is the probability that the first element is still in the box? It is the probability that it survived the first (ok, this is easy), second and third swap opportunity: 1/1 1/2 2/3 = 1/3. Looks good! And the second element? It had to survive the second and third swap opportunity, which happens with a probability of 1/2 * 2/3 = 1/3. It seems to work! And indeed, it does, as one can see with… induction!

Here is the code:
from random import random
class ReservoirSampler:
def __init__(self):
self.result = None
self.n = 0
def update(self, element):
self.n += 1
if random() < 1 / self.n: # Satisfied with prob. 1/n.
self.result = element
We can do a quick check if it works. Let us repeatedly sample from a dataset of size 20. We expect to draw each element in about 5% of all cases.
results = []
for _ in range(1000000):
r = ReservoirSampler()
for s in range(20):
r.update(s)
results.append(r.result)
Visualizing results
gives the following:

We can see that each element got sampled in around 5% of all trials. Perfect!
Conclusion
We have seen that even nowadays memory-efficient algorithms are necessary. Clever tricks to process extremely large data sets are still relevant, and luckily, smart people have put a lot of effort into this field.
In this article, I presented to you three quite simple examples of algorithms that should teach you how to approach the problem of extremely constrained memory.
Next time, if your data does not fit into your RAM again, think about if there might be a way to process it in a streaming fashion!
I hope that you learned something new, interesting, and useful today. Thanks for reading!
As the last point, if you
- want to support me in writing more about machine learning and
- plan to get a Medium subscription anyway,
why not do it via this link? This would help me a lot! 😊
To be transparent, the price for you does not change, but about half of the subscription fees go directly to me.
Thanks a lot, if you consider supporting me!
If you have any questions, write me on LinkedIn!