A Standard Error

Ludi Rehak
Towards Data Science
3 min readJun 25, 2017

--

I’m currently reading a best-selling book on probability as applied to everyday life, The Drunkard’s Walk: How Randomness Rules our Lives, by Leonard Mlodinow. It eloquently traces the history of measuring uncertainty, computing the probability of random events along the way. A choice quote:

The outline of our lives, like the candle’s flame, is continuously coaxed in new directions by a variety of random events that, along with our responses to them, determine our fate.

The calculations arrive at insights that surprise and illuminate just how off-base human intuition of randomness can be. For example, if you play the California state lottery, you are about as likely to die en route to your local lottery ticket vendor as you are to win it. Not as enticing a game to play when you consider that outcome.

In this passage on the mathematician Bernoulli and the distribution that bears his name, I found an error:

Suppose 60 percent of the voters in Basel support the mayor. How many people must you poll for the chances to be 99.9 percent that you will find the mayor’s support to be between 58 percent and 62 percent — that is, for the result to be accurate within plus or minus 2 percent? (Assume, in order to be consistent with Bernoulli, that the people polled are chosen at random, but with replacement. In other words, it is possible that you poll a person more than once.) The answer is 25,550, which in Bernoulli’s time was roughly the entire population of Basel. That this number was impractical wasn’t lost on Bernoulli.

By my calculation, you’d only need to poll 6,446 people, not 25,550. I used the CDF of the binomial to find the smallest n that gives at least 0.999 probability that the estimate is between .58 and .62. Here’s the one-liner in R with p=0.6 and n=6446:

> pbinom(ceiling(n*.62),n,p)-pbinom(floor(n*.58),n,p)
[1] 0.9990067

A more direct way to solve for n is to use a formula for the standard error of the binomial mean. The following formula relies on the normal approximation of the binomial. We want to find n such that the interval is equal to [0.58, 0.62].

z_{𝛼/2} is the 1- 𝛼/2 quantile of a standard normal distribution, where 𝛼 is the error rate, 1–0.999 = .001 in this case. Solving for n gives 6496.5, a similar value to the one calculated earlier.

If the author’s 25,550 people are polled, then the 99.9% interval is p±0.01, not ±0.02. In other words, 99.9% of polls of 25,550 people would show the estimate to be between 59% and 61%, rather than the stated 58% and 62%.

A minor error in a delightful book.

--

--