
Introduction
Confidence intervals are an essential concept to understand in Statistics and thus Data Science. In this article, I will simply and concisely explain what confidence intervals are and how to calculate them.
Intuition
Put simply, a confidence interval can be thought of as the associated uncertainty of a sampled parameter from a given population dataset.
The interval is a range of values for a given parameter, typically the mean, with an attached ‘confidence’ to measure how certain you are that the true population parameter lies within that random samples interval range.
The confidence level refers to the certainty that the confidence interval will contain the true population parameter when you draw numerous random samples. The most common probability limits are 95% and 99%. This means that 95% of random samples, drawn with a 95% confidence interval, will contain the true parameter. ** This does not mean that a given random sample has a _95**_% chance of containing the true parameter within its interval range. This is a small nuance but an important one.
Note: Confidence intervals are ubiqioustly computed using the normal confidence interval formula. What about non-normal data? Most populations obey the Central Limit Theorem, therefore we can compute normal confidence intervals for most non-normal data.
Mathematics
The formula for the (normal) confidence interval, CI is:

Where x̄ is the sample mean, s is the sample standard deviation, n is the sample size and z is the number of standard deviations from the mean.
If z = 1.96, this refers to a 95% confidence and z = 2.576 refers to a 99% confidence. This comes from how much data falls within different standard deviations for the normal distribution.
One can see that as n tends to infinity, the interval range becomes smaller and smaller and will eventually reach zero. This means we are practically 100% certain that we have the true mean.
Example
Lets say we have 10 exam results from students: 70, 80, 85, 75, 71, 65, 90, 96, 95, 60. The mean of this population is 78.7.
Let’s say we sample the first 5 results: 70, 80, 85, 75, 71 and want 95% confidence:
- x̄ = 76.2
- s ~ 6.3
- n = 5
- z = 1.96
Therefore, we find our confidence interval to be:

Indeed, the population mean lies within our sampled mean’s confidence interval in this case.
Conclusion
Hope you enjoyed this short and sweet article on confidence intervals!
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.