Uncertainty Sampling Cheatsheet

Published in

Towards Data Science

4 min readJul 29, 2019

When a Supervised Machine Learning model makes a prediction, it often gives a confidence in that prediction. If the model is uncertain (low confidence), then human feedback can help. Getting human feedback when a model is uncertain is a type of Active Learning known as Uncertainty Sampling.

The four types of Uncertainty Sampling covered in the cheatsheet are:

Least Confidence: difference between the most confident prediction and 100% confidence
Margin of Confidence: difference between the top two most confident predictions
Ratio of Confidence: ratio between the top two most confident predictions
Entropy: difference between all predictions, as defined by information theory

This article shares a cheatsheet for these four common ways to calculate uncertainty, with examples, equations and python code. Use it as a reference the next time you need to decide how to calculate your model’s confidence!

Data Scientists often use Uncertainty Sampling to sample items for human review. For example, imagine that you are responsible for a Machine Learning model to help a self-driving car understand traffic. You might have millions of unlabeled images taken from cameras on the front of cars, but you only have the time or budget to label 1,000. If you sample randomly, you might get images that are mostly from high-way driving, where the self-driving car is already confident and doesn’t need additional training. So, you will used Uncertainty Sampling to find the 1,000 most “uncertain” images, where your model is the most confused.

When you update your model with the newly labeled examples, it should get smarter, faster.

This cheatsheet is excerpted from my book, Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning

See the book for more details on each method and for more sophisticated problems than labeling, like predicting sequences of text and semantic segmentation for images. The principles of uncertainty are the same, but the calculations of uncertainty will be different.

The book also covers other Active Learning strategies, like Diversity Sampling, and the best ways to interpret your models probability distributions (hint: you probably can’t trust the confidence of your model).

Download Code and Cheatsheet:

You can download the code here:

rmunro/uncertainty_sampling_numpy

NumPy implementations of common Active Learning strategies for Uncertainty Sampling The four types of Uncertainty…

github.com

You can download a PDF version of the cheatsheet here:

I’ll also be releasing the code in PyTorch at a later date, along with a larger variety of algorithms.

EDIT: the PyTorch version of this cheat sheet is now available: http://robertmunro.com/Uncertainty_Sampling_Cheatsheet_PyTorch.pdf

Implementing Uncertainty Sampling

All the equations in the cheatsheet return uncertainty scores in a [0,1] range, where 1 is the most uncertain. Many academic papers don’t normalize the scores, so you might see different equations in the papers above. However, I recommend that you implement the normalized [0–1] equations for any code that you will use for non-research purposes: a [0–1] range makes it easier for downstream processing, for unit tests, and to spot-check the results.

The code shared in this article and in the GitHub repository is open source, so please use it to get started!

Robert Munro

July, 2019

Uncertainty Sampling Cheatsheet

Download Code and Cheatsheet:

rmunro/uncertainty_sampling_numpy

NumPy implementations of common Active Learning strategies for Uncertainty Sampling The four types of Uncertainty…

Further Reading

Knowledge Quadrant for Machine Learning

Transfer Learning, Uncertainty Sampling, and Diversity Sampling to improve your Machine Learning models.

Implementing Uncertainty Sampling

Written by Robert (Munro) Monarch