Limit Theorems and the Central Limit Theorem
Learning objectives
- State the Central Limit Theorem precisely, including the standardisation
- Distinguish the CLT from the Law of Large Numbers
- Apply the CLT to compute approximate probabilities for sample means
- Build normal-approximation confidence intervals
- Recognise when the CLT is unsafe (heavy tails, dependence, small samples)
The Central Limit Theorem is the most consequential single result in probability and statistics. It explains why a bell curve appears everywhere, in heights, in measurement error, in the diffusion of pollutants, in the long-run behaviour of nearly every estimator we use. It also tells us the EXACT rate at which sample averages converge to their true mean, providing the inferential machinery (confidence intervals, hypothesis tests, p-values) that underpins all empirical science. If you understand the CLT, you understand why "n large" is the magic word.
The Law of Large Numbers (first)
Before the CLT, the Law of Large Numbers says: if are i.i.d. with finite mean , then as . This is a FIRST-order statement: the sample average converges to the true mean.
But it leaves the next question wide open: HOW FAST does approach ? And what does the fluctuation around look like at finite ? Those are the questions the CLT answers.
Statement of the CLT
Theorem (Lindeberg-Lévy CLT). Let be i.i.d. with mean and FINITE variance . Define the standardised sample mean
.
Then converges in distribution to a standard normal: . Equivalently, for any real ,
as ,
where is the standard-normal CDF.
Reading this in plain language: the sample mean is approximately normal with mean and variance . The remarkable feature is that this is true REGARDLESS of the distribution of the , uniform, exponential, Bernoulli, Poisson, any finite-variance distribution. The bell curve emerges from the sum, not from the summands.
A worked example: 100 dice
Roll 100 fair dice and average the faces. The mean of one die is ; the variance is , so . By the CLT:
,
so the standard error of the sample mean is . The probability that the sample mean exceeds 3.7 is approximately
.
The original dice distribution is uniform-discrete on , nothing bell-curve-like about it. But the AVERAGE of 100 of them is, to two decimal places, a normal random variable.
Plot the standard-normal density and notice the iconic bell shape with mass concentrated within . Memorise the 68-95-99.7 rule: probabilities lie within standard deviations of the mean.
Confidence intervals
The CLT gives the standard normal-approximation confidence interval for an unknown mean :
.
For 95% confidence, . The half-width is the margin of error; it shrinks like , which is the source of the slogan "four times the data buys you half the error."
When is unknown (always, in practice), it is replaced by the sample standard deviation , and for small the normal quantile is replaced by a Student- quantile to account for the extra uncertainty in estimating .
- Statistical inference everywhere: Every t-test, confidence interval, p-value, and standard error reported in any field of empirical science is a direct application of the CLT. The Normal distribution's monopoly on inference is its consequence.
- Polling and survey research: A poll of voters reports "margin of error 3 percentage points", that is , the worst-case CLT bound for a Bernoulli sample mean.
- Quality control, control charts: Manufacturing processes plot over time and trigger alarms when it crosses control limits. Pure CLT.
- Insurance and risk: The capital requirement of an insurer holding independent policies scales as (one CLT-standard-deviation of loss), not as the total expected loss. This is why pooling makes insurance feasible.
- Brownian motion and diffusion: Take the CLT limit of a random walk over fine time-steps and you get continuous-time Brownian motion, the foundation of the Black-Scholes model and most stochastic differential equations in physics.
Pause and think: The CLT requires FINITE variance. Cauchy random variables have undefined variance, and their sample mean does NOT converge to a constant, in fact, has the same Cauchy distribution as a single observation. Why does the CLT machinery fail in this case? (Hint: where does even live when ?)
Try it
- A factory produces resistors with mean resistance and standard deviation . You sample 25 resistors. What is the approximate probability that the sample mean exceeds ? (Answer: .)
- Compute the 95% confidence interval for the mean voter preference in a Bernoulli poll of where . (Hint: ; interval .)
- You need a margin of error for a Bernoulli poll. What is the smallest guaranteed to suffice, regardless of ? (Use worst-case .)
- Roll a fair die 1000 times. Approximate the probability that the total exceeds 3600. (Compute , ; .)
- Distinguish the LLN (the sample mean converges to ) from the CLT (the standardised sample mean converges in distribution to ). Which one would you cite when justifying that simulation estimates eventually become exact, vs. when computing a confidence interval?
A trap to watch for
The CLT is a LIMIT theorem, it says nothing about small , and even at moderate it can be a poor approximation for HEAVY-TAILED distributions (large variance contribution from rare events) or HIGHLY SKEWED distributions. A common heuristic is "n at least 30" but this is folklore, not a theorem; for Bernoulli with small , you need at least AND for the normal approximation to be safe. Also: the CLT assumes INDEPENDENCE. For correlated data (time series, network data, repeated measurements on the same subject), the effective sample size is smaller than and the naive CLT understates the confidence interval. Time-series statisticians spend their careers correcting for this.
What you now know
You can apply the CLT to compute approximate probabilities for sample means, build normal-approximation confidence intervals, recognise that the convergence rate is , and spot the failure modes (heavy tails, dependence, small ). You now have the toolkit to read and write empirical claims with quantitative uncertainty, the working language of every scientific discipline downstream. The next chapter pivots from probability to algorithms, where the rules of the game change from "what happens by chance" to "how fast can a procedure run."
Mark section complete →
References
- Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 15.
- Ross, S. M. (2014). A First Course in Probability (9th ed.). Pearson, ch. 8.
- Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1 (3rd ed.). Wiley, ch. 10.
- Durrett, R. (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press, ch. 3.
- Billingsley, P. (1995). Probability and Measure (3rd ed.). Wiley, ch. 27 (rigorous CLT proofs).