Counting Distinct Events in Streams

Big data statistics in distributed settings

Arun Jagota
Towards Data Science
11 min readSep 2, 2022

--

Photo by Acton Crawford on Unsplash

Imagine an infinite stream of incoming symbols. We’d like to know the number of distinct values received so far at any point in time.

This problem has a number of uses. One of them is to track the number of distinct visitors to a heavily-visited website over a certain time period, say the past month.

--

--

PhD, Computer Science, neural nets. 14+ years in industry: data science algos developer. 24+ patents issued. 50 academic pubs. Blogs on ML/data science topics.