An Introduction to Conformal Prediction
A method you should always consider when quantifying AI uncertainty
In decision-making, Artificial Intelligence (AI) systems need to not only make predictions but also quantify their predictions' certainty (uncertainty). For example, consider a stock automatic trading system where an AI system predicts the stock price. A point prediction from the AI system might be dramatically different from the real value because of the high stochasticity of the stock market. But, on the other hand, if the AI system could estimate the range which guarantees to cover the true value with high probability, the trading system could compute the best and worst rewards and make more sensible decisions.
Conformal prediction is a technique for quantifying such uncertainties for AI systems. In particular, given an input, conformal prediction estimates a prediction interval in regression problems and a set of classes in classification problems. Both the prediction interval and sets are guaranteed to cover the true value with high probability.
In A Tutorial on Conformal Prediction, Glenn Shafer and Vladimir Vovk provide the following definition:
“ Conformal prediciton uses past experience to determine precise levels of confidence in new predictions.”
In this article, we utilize the conformal prediction on Identically Independently Distributed (i.i.d) datasets, i.e.,
where X’s are input features and Y’s are labels, n is the number of data points. We also assume that a machine learning model f: X->Y has been trained. This model could be a classical machine learning model like linear regression, Support Vector Machine (SVM), or deep learning techniques like fully connected or Convolutional networks. The goal is to estimate prediction sets for model outputs. To illustrate this idea, we will describe conformal prediction's steps, guarantees, and examples.
I. Steps
a. Identify score function to quantify inconsistency
Identify an appropriate score function s(X, Y) ∈ R to measure the discrepancy between model outputs ŷ’s and labels y’s. This score function is critical in that it actually decides what prediction sets we could get. For instance, in regression problems, we could take the |ŷ-y| as the score function. This way, the resulting prediction sets whose values are within an L1-norm ball around the prediction ŷ; in classification problems, we could take 1-ŷ_i as the score function, where ŷ_i is the predicted logits for the true classic. This way, we will get a prediction set of classes whose predicted logits are greater than some threshold.
b. Compute (1- ɑ) quantile of scores
Compute ê as the (1- ɑ) quantile of the scores {s_1, …, s_n}, where
In the full conformal prediction method, we need to train m models to compute the scores and construct prediction sets, where m is the number of possible values Y_{n+1} could take. It is undoubtedly computationally expensive. To lower the computation complexity, Inductive (split) Conformal Prediction could be used. Briefly, this method splits the whole training set into a proper training set and calibration set. Then, the model is only trained on the proper training set; and scores are solely computed on the calibration set. In this way, we only need to train the model once.
c. Construct prediction sets using model predictions and (1- ɑ) quantile
Use the quantile to form the prediction sets for new examples,
With the (1- ɑ) quantile ê, we could construct the prediction set for an input X_{n+1} by including values y’s whose score with the input s(X_{n+1}, y) is less or equal than ê.
II. Guarantee
The conformal prediction could provide mathematically rigorous guarantees. Let Y_{n+1} be the true value. Y could be a class label in the classification problem or a real value in the regression problem. Let τ(X_{n+1})be a prediction set (or interval). We define that τ(X_{n+1}) covers Y_{n+1} if Y_{n+1} is in τ(X_{n+1}), i.e.,
Then, given a set of Identically Independently Distributed (i.i.d) samples {(X_1, Y_1), (X_2, Y_2,), …, (X_n, Y_n)}, the conformal prediction set satisfies the following coverage guarantee, i.e.,
A proof of the coverage guarantee based on exchangeability (i.i.d) assumption could be found in the appendix of A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. Note that, in this proof, the (1- ɑ) level is changed to (n+1)(1- ɑ)/n to account for the finite sample case.
III. Examples
In this section, we aim to give some intuition on what conformal prediction sets look like.
a. Classification prediction set
We show three classification prediction sets in Figure 1. In this task, a neural network was trained to classify input images. The conformal prediction was used to construct class sets (in curly bracket) that are guaranteed to include the true label for the input images with high probability.
b. Regression prediction interval
In Figure 2, we illustrate a regression prediction interval. The system's goal was to predict Ŷ given input X. Conformal prediction constructs regression prediction sets (intervals between blue lines) that are guaranteed to cover the true value Y with high probability.
IV. Conclusion
This article gives a brief introduction to conformal prediction, including its steps, guarantees, and examples. We can apply conformal prediction to both classification and regression problems. In both cases, conformal prediction aims at quantifying machine learning models’ uncertainty. Conformal prediction is easy to implement and can provide rigorous guarantees. We believe conformal prediction is a good stepping stone toward safe AI where uncertainty quantification is a must!