Member-only story

Machine Learning Does Not Only Predict the Future, It Actively Creates It

A primer on position bias (and why it matters)

Samuel Flender
Towards Data Science
4 min readJan 11, 2023

Image generated with Stable Diffusion

Standard Machine Learning curricula teach that ML models learn from patterns that exist in the past in order make predictions about the future.

This is a neat simplification, but things change dramatically once the predictions from these models are being used in production, where they create feedback loops: now, the model predictions themselves are impacting the world that the model is trying to learn from. Our models no longer just predict the future, they actively create it.

One such feedback loop is position bias, a phenomenon that’s been observed in ranking models, those that power search engines, recommender systems, social media feeds and ads rankers, across the industry.

What is position bias?

Position bias means that the highest-ranked items (videos on Netflix, pages on Google, products on Amazon, posts on Facebook, or Tweets on Twitter) are the ones which create the most engagement not because they’re actually the best content for the user, but instead simply because they’re ranked highest.

This bias manifests because the ranking model is so good that users start blindly trusting the top-ranked…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Samuel Flender

Cutting-edge ML research, simplified. For the latest, subscribe to my Newsletter: mlfrontiers.substack.com

Responses (4)

What are your thoughts?

One way is result randomization: for a small subset of the serving population, simply re-rank the top N items randomly, and then measure the change in engagements as a function of rank ...

When using true randomization, I agree that it might be diminishing to good user experience. However, you can minimize this damage by using a Multi-Armed-Bandid algorithm (see e.g…

--

As is often the case, awareness is the essential first step. Then you can utilize methods like result randomization, intervention harvesting, and counterfactual analysis. This will improve the accuracy and diversity of the ranking algorithms and…

--

Great stuff

--