An Introduction to Conformal Prediction

A method you should always consider when quantifying AI uncertainty

Shuo Li
Towards Data Science

--

In decision-making, Artificial Intelligence (AI) systems need to not only make predictions but also quantify their predictions' certainty (uncertainty). For example, consider a stock automatic trading system where an AI system predicts the stock price. A point prediction from the AI system might be dramatically different from the real value because of the high stochasticity of the stock market. But, on the other hand, if the AI system could estimate the range which guarantees to cover the true value with high probability, the trading system could compute the best and worst rewards and make more sensible decisions.

Conformal prediction is a technique for quantifying such uncertainties for AI systems. In particular, given an input, conformal prediction estimates a prediction interval in regression problems and a set of classes in classification problems. Both the prediction interval and sets are guaranteed to cover the true value with high probability.

In A Tutorial on Conformal Prediction, Glenn Shafer and Vladimir Vovk provide the following definition:

“ Conformal prediciton uses past experience to determine precise levels of confidence in new predictions.”

In this article, we utilize the conformal prediction on Identically Independently Distributed (i.i.d) datasets, i.e.,

where X’s are input features and Y’s are labels, n is the number of data points. We also assume that a machine learning model f: X->Y has been trained. This model could be a classical machine learning model like linear regression, Support Vector Machine (SVM), or deep learning techniques like fully connected or Convolutional networks. The goal is to estimate prediction sets for model outputs. To illustrate this idea, we will describe conformal prediction's steps, guarantees, and examples.

Image by the author

I. Steps

a. Identify score function to quantify inconsistency

Identify an appropriate score function s(X, Y) ∈ R to measure the discrepancy between model outputs ŷ’s and labels y’s. This score function is critical in that it actually decides what prediction sets we could get. For instance, in regression problems, we could take the |ŷ-y| as the score function. This way, the resulting prediction sets whose values are within an L1-norm ball around the prediction ŷ; in classification problems, we could take 1-ŷ_i as the score function, where ŷ_i is the predicted logits for the true classic. This way, we will get a prediction set of classes whose predicted logits are greater than some threshold.

b. Compute (1- ɑ) quantile of scores

Compute ê as the (1- ɑ) quantile of the scores {s_1, …, s_n}, where

In the full conformal prediction method, we need to train m models to compute the scores and construct prediction sets, where m is the number of possible values Y_{n+1} could take. It is undoubtedly computationally expensive. To lower the computation complexity, Inductive (split) Conformal Prediction could be used. Briefly, this method splits the whole training set into a proper training set and calibration set. Then, the model is only trained on the proper training set; and scores are solely computed on the calibration set. In this way, we only need to train the model once.

c. Construct prediction sets using model predictions and (1- ɑ) quantile

Use the quantile to form the prediction sets for new examples,

With the (1- ɑ) quantile ê, we could construct the prediction set for an input X_{n+1} by including values y’s whose score with the input s(X_{n+1}, y) is less or equal than ê.

II. Guarantee

The conformal prediction could provide mathematically rigorous guarantees. Let Y_{n+1} be the true value. Y could be a class label in the classification problem or a real value in the regression problem. Let τ(X_{n+1})be a prediction set (or interval). We define that τ(X_{n+1}) covers Y_{n+1} if Y_{n+1} is in τ(X_{n+1}), i.e.,

Then, given a set of Identically Independently Distributed (i.i.d) samples {(X_1, Y_1), (X_2, Y_2,), …, (X_n, Y_n)}, the conformal prediction set satisfies the following coverage guarantee, i.e.,

A proof of the coverage guarantee based on exchangeability (i.i.d) assumption could be found in the appendix of A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. Note that, in this proof, the (1- ɑ) level is changed to (n+1)(1- ɑ)/n to account for the finite sample case.

III. Examples

In this section, we aim to give some intuition on what conformal prediction sets look like.

a. Classification prediction set

We show three classification prediction sets in Figure 1. In this task, a neural network was trained to classify input images. The conformal prediction was used to construct class sets (in curly bracket) that are guaranteed to include the true label for the input images with high probability.

Figure 1 Classification Prediction Sets (From Sangdon Park, etc.: PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction)

b. Regression prediction interval

In Figure 2, we illustrate a regression prediction interval. The system's goal was to predict Ŷ given input X. Conformal prediction constructs regression prediction sets (intervals between blue lines) that are guaranteed to cover the true value Y with high probability.

Figure 2 Regression Prediction set (From Anastasios N. Angelopoulos and Stephen Bates: A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification)

IV. Conclusion

This article gives a brief introduction to conformal prediction, including its steps, guarantees, and examples. We can apply conformal prediction to both classification and regression problems. In both cases, conformal prediction aims at quantifying machine learning models’ uncertainty. Conformal prediction is easy to implement and can provide rigorous guarantees. We believe conformal prediction is a good stepping stone toward safe AI where uncertainty quantification is a must!

--

--