Confidence Interval: Are You Interpreting Correctly?

A detailed explanation

Vivekananda Das
Towards Data Science

--

Photo by Camylla Battani on Unsplash

In my previous article, I discussed why you should prefer reporting 95% confidence interval over p-value, especially when you are explaining the findings of your study to non-statistician readers/audiences. In this article, I continue the discussion further and try to provide a bit more clarity.

Regarding the “correct” interpretation of the confidence interval, one of my readers asked a great question:

As a student, I had the same question for a long time. Indeed, for many of us, this concept is unintuitive.

We have the impression that statistics is about explaining data through cool graphs, estimating magical models, etc. And all these statements are correct. However, after four years of education, my understanding is that, at its core, statistics is about reducing the complexity of a world full of unknowns with the end goal of making some sophisticated guesses about what we don’t know based on what we know through a systematic process.

Unfortunately, most humans prefer certainty over uncertainty because the latter makes us uncomfortable. My hypothesis is that our desire for “certainty” is one of the key reasons why many of us struggle to grasp the fundamentals of statistics.

Okay! Let’s focus on my reader’s concern now. Why do these two statements imply different things? 🤔

Statement 1: We are 95% confident that *the bowl* contains/captures the apple

Statement 2: We are 95% confident that *the apple* is in/lies in the bowl

These two statements can refer to two different games of uncertainties.

Image by the author. Some of the cliparts used in the image are downloaded from https://publicdomainvectors.org/, which offers copyright-free vector images in popular .eps, .svg, .ai, and .cdr formats.

Game #1

There is an apple sitting still on a table. The apple is the target and it exists at a fixed location. There is no uncertainty regarding the apple. It exists where it exists.

As a player, you get a bowl to throw at the apple. The goal is that the bowl captures the apple (upside down).

You are quite good at the game.

If you throw the bowl 100 times, the bowl captures the apple 95 times.

Therefore, if you throw the bowl once, the probability that the bowl captures the apple is = 95/100 = 0.95 = 95%

To describe your excellence as a player, we say “We are 95% confident that the bowl captures the apple (if you throw the bowl once)”.

Game #2

There is a bowl sitting still on a table. The bowl is the target and it exists at a fixed location. There is no uncertainty regarding the bowl. It exists where it exists.

As a player, you get an apple to throw at the bowl. The goal is that the apple is captured in the bowl.

You are quite good at the game.

If you throw the apple 100 times, the apple is captured in the bowl 95 times.

Therefore, if you throw the apple once, the probability that the apple is captured in the bowl is = 95/100 = 0.95 = 95%

To describe your excellence as a player, we say “We are 95% confident that the apple is in the bowl (if you throw the apple once)”.

Moral of the story

(1) The target parameter has no uncertainty. It exists at a fixed place (at least we assume it does) at any given point in time.

(2) The thing we are throwing at the target parameter is uncertain (whether it captures the fixed target or not🎯). Therefore, the statement on probability should be with respect to whether the thrown thing captures the fixed target.

Confidence Interval Example

At a hypothetical university, students’ average weight (body mass to be more precise) is 130 lbs. It is a fixed number (at any given time) but unknown to you. As a researcher, you are trying to estimate the population’s average weight.

There are 10,000 students at the university and it is impossible for you to weigh each student and record their weight. You get a list of all the students and randomly select 100.

They all agree to let you measure their weight. And you collect the information. The sample average weight is 125 lbs.

You realize that this is just one sample from the target population — out of an infinite number of samples that could have been drawn. If you draw another sample from the same population, then the sample average weight could be 132 lbs, 120 lbs, 142 lbs, or some other value. You want to reduce this uncertainty by constructing an interval of possible numbers.

Let’s assume, for the one sample you selected, the 95% confidence interval is = [115 lbs, 135 lbs].

You say “ I am 95% confident that *this interval* captures the true population average weight”.

This is not the same as saying “I am 95% confident that *the true population average weight* lies (is captured) in this interval.”

Focusing on the subtlety one more time! 😬

The true population average is your target parameter which exists somewhere in the universe and you assume it is a fixed number at any given point in time. Because it is not changing, the statement on uncertainty is not with respect to it.

On the contrary, the confidence intervals, which you can construct, have uncertainties. If you draw 100 samples from the same population → you get 100 different confidence intervals → 95 are expected to capture the true population parameter. You are, as if, throwing confidence intervals at the fixed population parameter.

The above narrative implies if you draw only one sample, the probability that the confidence interval captures the true population parameter (i.e., the fixed target) is 95%.

Without delving too deep into philosophy, in “colloquial” English,

The bowl contains the apple == The apple is in the bowl

The two statements imply the same thing because we are in the land of certainty. When we believe that we know something for sure, we do not make any probabilistic statements. (please don’t ask: do we know anything for sure? 🙏)

However, when we are in the land of “some” uncertainty, in “statistical” English,

We are 95% confident that the bowl contains the apple =/= We are 95% confident that the apple is in the bowl

I hope this helps. Again, I appreciate your great comments on my articles. I hope we continue this discussion and help each other make sense of the complex world with a bit more clarity.

Meanwhile, if you would like to read some of my previous posts on how to attempt to know the unknown, here are some suggestions:

--

--

Sharing synthesized ideas on Data Analysis in R, Data Literacy, Causal Inference, and Well-being | Assistant Prof @ UUtah | More: https://vivekanandadas.com