Introduction
In almost any role in a business, you’ll have to make estimations:
- In marketing, you have to estimate ROIs for future campaigns.
- In supply chain management, you have to forecast (estimate) the amount of inventory you’ll need.
- In product development, confidence intervals are important for determining reliable specifications of a product.
Now it’s one thing to make an estimation, but it’s another to give make an estimation and provide a confidence level.
Consider the following…
- Statement A: I estimated that we’ll do $500,000 in sales next year.
- Statement B: I estimated and I am 95% confidence that our sales next year will fall between $450,000 and $550,000.
What’s the difference between the two?
Statement B provides us with more information because it not only provides us with a statistic, but it tells us how "confident" we can be in that statistic. If the idea of confidence and confidence levels doesn’t make sense to you, don’t worry, keep reading.
Objective
In this article, we’re going to go through what confidence intervals are, why they’re important, and how you can calculate them.
BUT before we dive into confidence intervals, we first need to talk about an extremely important concept, the Central Limit Theorem.
Central Limit Theorem
The central limit theorem is very powerful – it states that the distribution of sample means approximates a normal distribution.

To give an example, imagine that you took a sample from a Data set and calculated the mean of that sample. Once repeated multiple times, you would plot all your sample means and their frequencies onto a graph. No matter what the initial distribution looks like, it will always end up as a normal distribution.
This is REALLY important because this means that we can take advantage of statistical techniques that assume a normal distribution, like confidence intervals.
So What are Confidence Intervals?

A confidence interval is simply a range of values that is highly likely to contain the parameter (or Statistics) of interest.
For example, using the image above, we can say that the sample mean is 100 OR we can say that we are 95% confidence that the sample mean lies between 90 and 110.
A point estimate on the other hand is a sample statistic that provides a single value estimate of a population parameter (i.e. the mean).
The margin of error is the distance between the point estimate and the end of a confidence interval.
As mentioned earlier, the benefit of using confidence intervals rather than a point estimate (i.e. the mean) is because it provides us with more information.
Generally speaking, the wider the confidence interval is, the more confident you are that the true parameter is inside the interval.
Confidence Intervals For Means
SINCE we know that the sampling distribution of the sampling mean is a normal distribution thanks to the central limit theorem, we know the following:
- the mean is equal to μ
- the sample mean is equal to x̅
- the standard deviation is equal to σ/√n

And so, if we wanted to find a 95% confidence interval, knowing the empirical rule, our margin of error would be two time the standard deviation – this is because approximate 95% of the data lies within 2 standard deviations.

We can then take a step back and generalize this as an equation. The Z-score is simply determine by how "confident" we want our confidence interval to be.
Now you’re probably wondering, "how are we supposed to know σ prior to all of this?" and that’s a good question. Generally, if you don’t know μ, you’re probably not going to know σ. That being said, it can mathematically be shown that you can use the sample standard deviation (s) as a substitute.
And so, the equation will look like this:

Since we’re relying on the central limit theorem for this to hold, there are a couple of conditions that also need to hold. One, the samples must be random and two, the sample size (n) must be greater or equal than 30.
Confidence Intervals for Proportions
Now what if, instead, you wanted to find the population proportion – that is, the ratio of the people/things with a certain characteristic in a population compared to the total size of the population.
In this case, the equation changes a little bit:
- Instead of finding the mean (μ), you’re now looking for the population proportion (p)
- The sample proportion is denoted as p̂
- The standard deviation is now = sqrt [ p̂(1 – p̂) / n ]
This results in the following equation:

Thanks for Reading!
If you made it to the end, great! You should know what a confidence interval is, why it’s useful, and how you can calculate one. Confidence intervals are extremely useful when it comes to making estimates and I highly encourage that you use confidence intervals in addition to making point estimates.
If you like articles like this, be sure to give me a follow on Medium. And as always, I wish you the best of luck in your endeavors!
Not sure what to read next? I’ve picked another article for you:
10 Statistical Concepts You Should Know For Data Science Interviews
and another!
A Complete 52 Week Curriculum to Become a Data Scientist in 2021
Terence Shin
- If you enjoyed this, follow me on Medium for more
- Sign up for my email list here!
- Interested in collaborating? Let’s connect on LinkedIn