Fish schools as ensemble learning algorithms

The accuracy is in the aggregate

Matt Sosna
Towards Data Science
8 min readJun 3, 2021

--

Photo by Quang Nguyen Vinh from Pexels

Animal groups are greater than the sum of their parts. The individual termite wanders cluelessly while the colony builds a sturdy and well-ventilated mound. The lone stork loses its way while the flock successfully migrates. Across the spectrum of cognitive complexity, we regularly see the emergence of behaviors at the group level that the members alone aren’t capable of. How is this possible?

I spent my Ph.D. puzzling over how golden shiner fish − a generally hopeless and not very intelligent creature − form schools capable of elegantly evading predators. I read dozens of articles and textbooks, conducted experiments, analyzed data, and worked with theorists to try to make sense of how when it comes to fish, 1+1=3, not 2.

All the knowledge I gained seemed destined to become a pile of dusty facts in some corner of my brain when I left academia to enter data science. But as I started my data science education, I was surprised to see a curious parallel between decision-making in the fish I’d studied, and decision-making in ensemble learning algorithms.

This post will show you how ensembles of weak learners − whether they’re fish or decision trees − can together form an accurate information processor.

The machine

Let’s first cover the machine learning side, since you’re probably more familiar with algorithms than animals! Ensemble learning methods use a set of models to generate a prediction, rather than one single model. The idea is that the errors in the models’ predictions cancel out, leading to more accurate predictions overall.

In the schematic below, our ensemble is the set of gray boxes, each of which is a model. To generate a predicted value for the input, the input is sent into each model, which generates a prediction. These individual predictions are then reduced to one aggregate prediction by either averaging (for regression) or taking the majority vote (for classification).

Image by author

--

--

Sr ML Engineer @ Meta. Princeton PhD. Dad. Write about data science, machine learning, statistics, Python. Addicted to learning how things work. Views my own.