Statistics 101: Credible vs Confidence Interval

Grasp the idea behind credible and confidence interval in 5 minutes

Federico Comotto
Towards Data Science

--

Photo by agus prianto on Unsplash

Under Bayes’ theorem, no theory is perfect. Rather, it is a work in progress, always subject to further refinement and testing. — Nate Silver

In the era of Artificial Intelligence, Bayesian statistics has certainly become a hot topic. A simple, but good explanation, is the ability of the Bayesian framework to reduce uncertainty around machine learning models leading to reliable predictions. Today, my aim is to dig up one core concept in Bayesian statistics: credible interval. Although, the notion of the credible interval is straightforward, many times it is confused with its well-known cousin: the confidence interval.

Bayesian and Frequentist framework

Before deep-diving into the main topic of this article, let me recap the key idea behind the bayesian and frequentist framework. It is important to have a basic understanding of the two frameworks to better define credible and confidence intervals.

In the frequentist framework, the probability of an event is equal to the long-term frequency of the event occurring when the same process is repeated multiple times. For example, the probability of having a specific disease under a frequentist philosophy is interpreted as the long-term frequency of having that specific disease. For many probabilities of events, this makes sense, but it gets more difficult to comprehend when events have no long-term frequency of occurrence.

Things to keep in mind: frequentist methods regard the population value as a fixed, unvarying (but unknown) quantity, without a probability distribution.

On the contrary, in the Bayesian framework, probability simply expresses a degree of belief (confidence) in an event and can be described through a distribution. The probability distribution for a population proportion expresses our prior belief about it before we add the knowledge which comes from the data. For example, your initial belief is “this article is useless”, but having read the first paragraph (you see some data), you decide to continue the reading. Let me know what is your idea at the end!

This way of thinking comes from the diachronic interpretation of Bayes’s theorem. “Diachronic” means that something is happening over time; in this case, the probability of the hypotheses (our belief) changes, over time, as we see new data.

Things to keep in mind: Bayesian methods are based on the idea that unknown quantities, such as population means and proportions, have probability distributions.

In summary, if we should summarise the frequentist and the Bayesian framework, we could say that:

  • In case we have a statistical problem and we treat it with the frequentist approach, as result, we’ll get fixed point estimates.
  • In case we have a statistical problem and we treat it with the Bayesian framework, as result, we will get distributions.
Image by the author — Bayesian vs frequentist framework: expected outcome

Confidence Interval

The frequentist confidence interval has the following long-run frequency idea: random samples from the same target population and with the same sample size would yield CIs that contain the true (unknown) estimate in a frequency (percentage) set by the confidence level. In practice, it’s quite complicated to have several random samples drawn from the same population; rather, we gather data from a single sample of the population of interest and calculate the CI for that sample. The interpretation of this particular CI would be: we can be XX% (90%, 95%, 99%) confident that the true (unknown) estimate would lie within the lower and upper limits of the CI, based on hypothesized repeats of the experiment.

Having said that, an XX% confidence level does not mean that for a given realized interval there is an XX% probability that the population parameter lies within the interval!

Credible Interval

Like confidence intervals, also credible intervals describe and summarise the uncertainty related to the unknown parameters you are trying to estimate, but using a probability distribution. While the goal of confidence and credible intervals is similar, their statistical definition and meaning are very different. Indeed, whereas the latter is calculated using an elaborated technique based on assumptions and approximations, the credible intervals are quite simple to calculate and interpret.

As the Bayesian inference returns the distribution of possible effect values (the posterior), the credible interval is just the range containing a particular percentage of probable values. For instance, the 95% credible interval is simply the central portion of the posterior distribution that contains 95% of the values.

When compared to the frequentist interval, the Bayesian interval is perhaps simpler to digest. Indeed, the Bayesian framework allows us to say “given the observed data, the effect has XX% probability of falling within this range”, compared to the more complicated, frequentist alternative would be “there is an XX% probability that when computing a confidence interval from data of this sort, the effect falls within this range”.

Conclusions

A frequentist XX% confidence interval means that with a large number of repeated samples, XX% of such calculated confidence intervals would include the true value of the parameter. In frequentist terms, the parameter is fixed (cannot be considered to have a distribution of possible values) and the confidence interval is random (as it depends on the random sample). On the other hand, Bayesian credible intervals are based on the idea that the estimated parameters are random variables with a distribution. Therefore, a credible interval is simply an interval, in the domain of the posterior distribution, within which an unobserved parameter value falls with a particular probability.

In general, it’s easy to get confused, but to my opinion, the concept of the credible interval is much more intuitive than the concept of the confidence interval. In the first case, a basic understanding of probability and distribution can be enough to grasp the idea behind credible intervals.

With this article, I wanted to give you a quick introduction to these two important notions in statistics, without being too technical or mathematical. I hope that after this reading you will have an additional weapon to distinguish between credible and confidence interval.

--

--

Data Scientist @Laife Reply, fond of Health and Nutrition Data, specialized in delivering Big Data and ML solutions for the Healthcare and the Pharma Industry