
"Proof by example" and "slothfull induction" can "prove" to be very dangerous.
Hasty generalization has always plagued the scientific realms for centuries. Often when the stakes are high, a rational argument for justification becomes a crucial requirement to minimize the potential loss that a fallacy could cause. These forms of sloppy approaches are observed very frequently in the field of sampling.
"A minimum of 30 observations is sufficient to conduct significant statistics."
This is open to many interpretations of which the most fallible one is that the sample size of 30 is enough to trust your confidence interval. Sampling is a very crucial aspect of experimental analysis and it’s always fair to say that the fate of the entire population heavily depends upon the probed sample set, especially when the population parameters are unknown. So in this light, the choice of sampling scheme and the sample size should be corroborated strongly. For this there is no thumb rule yet, and which will never be, considering the chaotic nature of our universe. However, there is certain art or an element of thinking to it while dealing with such questions of sampling. For example, if the population is non-seasonal and clustered then systematic sampling using cluster sampling or stratified sampling may prove to be helpful. With this, the size of the sample is another very crucial question for the experimentalist or statisticians. Often experiments are expensive and one needs to assert the optimum number of observations to conduct reasonably significant statistical analysis. Coming to our popular belief about the number 30, first, it’s important to understand why it’s 30 and then we should be able to appreciate the fact that it’s not a thumb rule and could produce fallacious conclusions.
P.S There are many levels at which this could be discussed and I will try to take a deep dive into it which might help in other comprehensions of different concepts of statistics. So bear with me, it’s gonna be fruitful in the end.
Background
The story follows from the most celebrated concept of probability theory, the Central Limit Theorem (CLT) that says that the distribution of sample means attains a normal distribution regardless of the distribution of the original population from which the samples are drawn. At first sight, this might appear as a haunting definition for non-statisticians, but it’s the sovereign of probability theory, and eventually one realizes that it contains a power to give a general shape to the chaos. So what exactly it’s trying to convey?
In layman’s terms, for a given population D with unknown distribution, if another population D’ is derived from this population D by incorporating the mean of every possible sample of some size, then this population of mean D’ follows a normal distribution.
Mathematically, if X₁, X₂, X₃,….. are random samples, each of size n, drawn from the population P with population mean μ and standard deviation σ, then the transformation Z = (X̄-μ)/(σ/√n) follows a Normal Distribution with mean 0 and variance 1 as the size of n increases. Notice that the metric (σ/√n) is the standard deviation of the distribution of X̄ which is often called as the standard error.
# R Code to Simulate the Central Limit Theorem
Universe <- sample(size= 10^6 , seq(1,100, length.out = 1000),
replace = T) # Lets create a chaotic toy universe
par(bg="black")
plot(Universe[1:10000], xlab="Observation",
ylab="Observed Value",
cex=0.05, cex.main=2,
xaxt="n", yaxt="n", pch=8, cex.lab=1.5,
col=rainbow(10), col.lab="white",
main= "Chaotic universe", col.main="white") # do you wanna see?

Fig 1 is our toy universe from which we will now record the mean of randomly sampled 30 observations (with replacement) for 10,000 times which will give us another population of means of size 10,000. The CLT then states that this population of mean behaves normally.
draw <- rep(NA, 10000) # 10,000 is the number of samples of some size(30 in our example) drawn from the universe
for(i in 1:length(draw)){
# Each draw-value is basically the mean of 30 ramdomly sampled
# observations from our Toy Universe
draw[i] <- mean(sample(Universe,
size=30, # sample size 30
replace = T))
}
Z <- (draw- mean(Universe))/ (sqrt(var(Universe)/30)) #Scaling the distribution of mean.

Fig 2A is the scatter plot of the means of samples drawn from the universe. Fig2A can also be comprehended as the top view of a normal curve with the base on Y-axis and the blue line indicating its mean. The red curve in Fig2B is the density distribution of the normalized means of samples (Z transformation). Notice that the curve behaves as the standard normal curve(white) but not perfectly. The mean of the distribution of mean deflects largely from the 0. This deflection could be reduced by employing more and larger sample sets. Don’t trust me? Fig3 is the same density function but this time the size of samples was 200. See how confident it looks.

More importantly, notice that in Fig2B, the tails of the density curve are very narrow relative to the standard normal distribution. Since a minor skew on the tail can cause a large variation in the confidence interval and sequentially to the testing results, and so you want to be extra cautious while putting your faith in your 30 sized samples.
So, Why 30 you may ask?
I hope the above section was useful and following that, you must agree that the above example is very unrealistic because in real life, to infer on sample means there is no certain knowledge about the population’s standard deviation σ even if the population mean μ is known through past experiences or previous census etcetera. Although CLT assures a normal behavior of distributions of mean, one still requires σ in order to test the hypothesis using Z score. To overcome this difficulty, Professor R A fisher and Professor Gosset came up with a completely new distribution function t-distribution to infer on the sample mean.

Where v is the degree of freedom n-1. To dwell deeper into this distribution, we require a whole another article but for now, in gist, the t-distribution is a continuous distribution function of population mean and its degree of freedom. Similar to Z standardization, if we are drawing n sized independent random sample from a normally distributed population with mean μ and standard deviation σ, then the transformation t= (X̄-μ)/(S/√n) follows a t distribution with n-1 degree of freedom. Notice that instead of σ, S is plugged in which is the sample variance . With an increase in the number of sample size, t-distribution gets closer and closer to the normal distribution.

But there is no free lunch, t-distribution only mimics the normal distribution and it has greater variability at the tail than a standard normal variable.

For instance, let’s say the level of significance is 0.05, then we know that Z value is at 1.96 for a standard normal distribution. But this isn’t true for a t-distribution at a 0.05 level of significance and 30 degrees of freedoms. The t value is 1.69 and results in a narrower confidence interval even though the above movie shows both Z and t distribution resonating

largely at sample size 30.
Theoretically, to reach a t-value of 1.96 at a 0.05 level of significance, one needs an infinitely large sample size or an infinite degree of freedom. In essence, more data is always useful. Larger the sample, precise the approximation. Considering the constraints, it is always advised to incorporate as much data as one can to reduce the chances of the type I error, which is basically rejecting the null hypothesis when in reality it’s true or more generally, labeling a sample as an outlier which in reality belongs to the true confidence interval.
Some eminent readings to get started with Statistics.
On the Student’s t-distribution and the t-statistic
Montgomery, Douglas C.; Runger, George C. (2014). Applied Statistics and Probability for Engineers (6th ed.). Wiley. p. 241. ISBN 9781118539712.
An Introduction to the t Distribution (Includes some mathematical details)
Insights in Hypothesis Testing and Making Decisions in Biomedical Research