The law of large numbers

Probability from zero

Learning objectives

  • State the strong and weak LLN
  • Recognise the SLLN convergence rate: deviations from μ shrink at 1/√n
  • Identify when LLN fails: distributions with infinite mean (Cauchy, certain Pareto)
  • Apply LLN to justify sample means as estimators
  • Distinguish almost-sure (strong) from in-probability (weak) convergence

The Law of Large Numbers is statistics' first big idea: the SAMPLE AVERAGE of i.i.d. observations converges to the POPULATION MEAN as the sample size grows. It is what makes statistical inference possible — sample-based estimators tell us something about populations.

Two forms: weak and strong

Let X1,X2,X_1, X_2, \ldots be i.i.d. with finite mean μ=E[Xi]\mu = E[X_i].

  • Weak LLN: XˉnPμ\bar{X}_n \xrightarrow{P} \mu — convergence in probability. For every ε>0\varepsilon > 0, P(Xˉnμ>ε)0P(|\bar{X}_n - \mu| > \varepsilon) \to 0.
  • Strong LLN: Xˉna.s.μ\bar{X}_n \xrightarrow{a.s.} \mu — convergence almost surely. With probability 1, the path Xˉn(ω)μ\bar{X}_n(\omega) \to \mu.

SLLN ⇒ WLLN; not vice versa. SLLN gives stronger guarantees: ALMOST EVERY sample path eventually settles near μ. WLLN allows wild paths so long as the probability of being far drops with n.

Convergence rate: the 1/√n rule

For finite-variance distributions: SD(Xˉn)=σ/n\mathrm{SD}(\bar{X}_n) = \sigma / \sqrt{n}. So the sample mean concentrates at rate 1/√n. To halve the error, you need 4× the data. To shrink by 10×, you need 100× the data.

This is the WHY behind: large studies (clinical trials, surveys) need large n; precision is expensive.

When LLN fails: pathological distributions

The LLN requires finite mean. For distributions with INFINITE OR UNDEFINED mean:

  • Cauchy(0, 1): the PDF f(x)=1/(π(1+x2))f(x) = 1/(\pi(1 + x^2)) has heavy tails that make E[X]E[|X|] diverge. Sample means do NOT converge — they have the SAME distribution as a single sample.
  • Pareto with shape α ≤ 1: heavy enough tails that even finite first moment fails.

These aren't just curiosities: real heavy-tailed phenomena (financial returns, network packet sizes, city sizes) can have distributions for which the empirical mean is unstable. Robust statistics (§4.5) becomes essential.

Lln DemoInteractive figure — enable JavaScript to interact.

Try it

  • Default Normal(0, 1). Three independent traces (different seeds) ALL approach the red μ = 0 line as n grows. By n = 1000 the spread across paths is ≈ 0.06 (matches σ/√n = 1/√1000 ≈ 0.032).
  • Switch to Exponential(1). Mean = 1. Convergence is slower because the distribution is skewed. By n = 1000, paths still bracket μ = 1 within ±0.05.
  • Switch to Cauchy(0, 1). Spread across paths does NOT shrink with n — sometimes the running mean jumps wildly even at n = 5000. This is LLN's failure case. The Cauchy is heavy-tailed enough that a single extreme sample can shift the running mean indefinitely.
  • Switch to Bernoulli(0.3). Discrete; both possible values 0 and 1. Sample mean converges to 0.3 cleanly. At n = 100 paths typically within ±0.05 of 0.3; at n = 1000 within ±0.015.
  • Slide n_max from 100 to 5000 with Normal. Verify the convergence visually: at n = 100 paths still drift; at n = 5000 they hug μ tightly. The PROPORTIONAL improvement is 1/√(5000/100) = 1/√50 ≈ 14% — a 50× n only gives ~7× better precision.

A pollster reports a survey of 1000 voters with margin of error ±3%. A follow-up survey doubles n to 2000. What new margin of error should they report (and why is it > ±1.5%)?

What you now know

Sample means converge to population means at rate 1/√n — but only for distributions with finite mean. Pathological heavy-tailed distributions (Cauchy) break the law entirely. Robust estimators (median, trimmed means) can recover convergence properties even for slightly-heavy-tailed distributions. §0.7 takes the next step: the CLT tells us how the sample mean is distributed around μ — bell-shaped with width σ/√n.

References

  • Wasserman, L. (2004). All of Statistics. Springer. (Chapter 5 — convergence of random variables.)
  • Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley. (Chapter 6 — strong LLN.)
  • Etemadi, N. (1981). "An elementary proof of the strong law of large numbers." Zeitschrift für Wahrscheinlichkeitstheorie 55, 119-122.
  • Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley. (Chapter 10.)
  • Tukey, J.W. (1960). "A survey of sampling from contaminated distributions." (Robust statistics motivation when LLN works but slowly under contamination.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.