Bootstrap, jackknife, and resampling first principles
Learning objectives
- State the plug-in principle: estimate any functional T(F) of an unknown CDF F by T(F̂ₙ), where F̂ₙ is the empirical CDF
- Write down the empirical CDF and explain why it is a sufficient summary of the sample for nonparametric purposes
- State the nonparametric bootstrap algorithm in three lines: (i) original sample, (ii) for b = 1..B resample with replacement and recompute θ̂*_b, (iii) summarise the empirical distribution of {θ̂*_b}
- Compute the bootstrap-based SE as SÊ_boot(θ̂) = SD({θ̂*_b}) and recognise it as an empirical Monte Carlo SE that requires no closed-form formula
- Distinguish percentile, basic/pivotal, and BCa bootstrap confidence intervals at the conceptual level (Part 3 §3.2 covers the mechanics)
- Identify the regularity conditions under which the bootstrap works — smooth functional of F, moderately large n — and the textbook failures: sample max from a bounded distribution, extreme tail quantiles, and very small n
- Define the jackknife: leave-one-out estimates , jackknife mean, and the SE formula
- Recognise that the jackknife is a deterministic, n-replicate, smooth-statistic special case of the bootstrap, and that it is INCONSISTENT for non-smooth statistics such as the sample median
- List the three named extensions — parametric bootstrap, wild bootstrap (regression residuals, Wu 1986), and block bootstrap (dependent data, Künsch 1989) — and the gap each one closes
- Articulate what the bootstrap CANNOT fix: sampling bias, underidentification, fundamentally broken estimators — the bootstrap is plug-in, not magic
§1.6 established the sampling distribution and the standard error, and ended with a hint: every SE in elementary statistics can be estimated by Monte Carlo — draw fresh samples from a known population, compute the estimator on each, take the SD. §1.7 turns that hint into a workhorse. In real research you do not have a known population; you have one sample of size . The bootstrap (Efron 1979) substitutes the empirical CDF for the unknown population CDF and resamples from it. That single move makes SE estimation, confidence intervals, and many other inferential summaries available for ANY estimator with no parametric assumptions — within regularity conditions that this section is careful to spell out.
The §1.7 arc is: empirical CDF as the natural plug-in for the population CDF; the plug-in principle for general functionals; the nonparametric bootstrap algorithm; bootstrap SE and a preview of bootstrap confidence intervals; when the bootstrap works and when it fails; the jackknife as a deterministic precursor; and the three named extensions (parametric, wild, block) that adapt the basic engine to different problem structures. Part 3 §3.2 builds bootstrap CIs (percentile, basic, BCa) on top; Part 8 §8.1 deepens the theory.
The plug-in principle and the empirical CDF
Start with a single number you wish to estimate. The population mean is . The population median is the solution of . The population variance is . Each is a functional of the unknown CDF — a rule that, given , returns a number .
The empirical CDF based on the sample is
This is a step function: zero below , climbing by at each observed value, plateauing at above . It IS the sample, repackaged as a distribution. The Glivenko-Cantelli theorem (Part 0 §0.7) guarantees that uniformly almost surely; the empirical CDF gets arbitrarily close to the population CDF as grows. It is the natural data-driven substitute for .
The plug-in principle is one line: estimate by . The plug-in mean is the sample mean. The plug-in median is the sample median. The plug-in variance is the (biased, denominator ) sample variance . Every classical "sample-X" estimator can be derived as a plug-in.
§1.6 used plug-ins narrowly: plugs the sample SD into the formula . The bootstrap goes one level deeper: it plugs in the ENTIRE empirical CDF, then asks what sampling distribution that produces. The answer is the bootstrap distribution.
The nonparametric bootstrap algorithm
Fix a sample and an estimator . The nonparametric bootstrap of is:
- Resample. For , draw INDEPENDENTLY WITH REPLACEMENT from the original sample (equivalently, IID from ). Each is equal to one of the original observations; the same observation may appear multiple times in one bootstrap sample.
- Recompute. Compute — the estimator on the bootstrap sample.
- Summarise. The empirical distribution of approximates the sampling distribution of .
Three lines. There is no parametric model, no closed-form SE, no normality assumption. Typical : 1000-2000 for an SE estimate, 5000-10000 for tail-quantile confidence intervals where the precision of the empirical quantiles depends directly on .
The bootstrap principle, in one sentence: do to the sample what you wish you could do to the population. You wish you could draw 10000 independent samples of size from the population and watch the estimator vary. You cannot. So you draw 10000 bootstrap samples of size from the empirical CDF instead, and watch the estimator vary there. The Glivenko-Cantelli theorem makes the substitution legitimate; the moderate- regularity conditions below say WHEN the legitimacy translates into good approximation.
Bootstrap SE
The bootstrap standard error is the empirical SD of the replicates:
where . That is the entire formula. It works for the sample mean (where it agrees with to within Monte Carlo noise), the sample median (where the plug-in formula requires the unknown density at the median), a ratio of two means (where no clean closed-form SE exists), or the maximum eigenvalue of a sample covariance matrix (where any closed form would be heroic). The bootstrap is the universal SE engine.
A related bootstrap output is the bias estimate:
The average of the bootstrap replicates minus the original estimate. If positive, the estimator tends to OVERSHOOT the truth (so underestimates by that amount on average); if negative, the reverse. For the sample mean this is exactly zero by linearity; for the (biased) plug-in variance estimator it recovers the textbook correction to within Monte Carlo noise — bootstrap rediscovers Bessel's correction (almost).
The first widget below makes the bootstrap distribution visible by drawing both the bootstrap-from-data histogram AND the true sampling distribution as an oracle overlay (the latter computed separately from many fresh samples from the known population — something you would never have in practice). When the bootstrap works, the two histograms hug each other.
Things to verify:
- Sample mean, Exponential(1), n = 30: blue and orange histograms overlap heavily; ratio bootstrap/true SE close to 1. The textbook regular case.
- Sample median, Normal(0, 1), n = 30: bootstrap SE matches the oracle SE within ±10%. The smooth-functional bootstrap working as advertised.
- Sample variance, Lognormal(0, 1), n = 30: the true sampling distribution is highly right-skewed (Lognormal has a huge 4th moment). The bootstrap captures the same skew. Ratio close to 1.
- Sample max, Uniform(0, 1), n = 30: blue and orange diverge sharply. The bootstrap pins to a few discrete values; the oracle is continuous. The verdict box reports the regularity failure — this is the textbook Bickel-Freedman 1981 counterexample.
- Sample mean, Cauchy(0, 1), n = 30: both histograms have Cauchy-shaped tails. The bootstrap is not "wrong" here — it correctly reflects the population's broken estimator. The CLT failure is in the mean, not in the bootstrap.
- Slide B from 50 to 5000: the bootstrap distribution gets smoother but its center and spread are stable. B controls Monte Carlo error of the bootstrap estimate; it does NOT change the asymptotic behaviour of the bootstrap itself. Increase n to shrink the bootstrap-vs-truth gap; increase B to shrink the Monte Carlo error within the bootstrap.
Bootstrap confidence intervals — a sneak preview
§1.6 introduced the Wald CI , which is honest exactly when the sampling distribution of is approximately Normal. The bootstrap distribution gives you a fuller picture of that sampling distribution, and three classical bootstrap CIs read off different summaries:
- Percentile CI: the and empirical quantiles of . Simple, robust to skew. For a 95% CI: the 2.5th and 97.5th percentiles of the bootstrap replicates. No SE involved.
- Basic (pivotal) CI: . Reflects the bootstrap quantiles around the point estimate, undoing the assumption that the sampling distribution is symmetric around .
- BCa (bias-corrected and accelerated): Efron (1987)'s preferred default. Adjusts the percentile endpoints for bias (using ) and for skewness (using a jackknife-derived "acceleration" constant). The most accurate of the three in general, especially for skewed sampling distributions.
The mechanics of each — including the BCa formula and how to choose — live in Part 3 §3.2. §1.7's point is to establish that once you have a bootstrap distribution, a confidence interval is one extra step. The bootstrap is not "just an SE machine"; it is the empirical sampling distribution, and any summary of that distribution is on the table.
When the bootstrap works — and when it does not
The bootstrap is most reliable for estimators that are smooth functionals of — informally, statistics that change continuously when you perturb the empirical distribution. The sample mean, the sample median (continuous distributions with positive density at the median), the sample variance, smooth M-estimators, correlation, regression coefficients with non-degenerate design matrices: all of these are bootstrap-friendly at moderate .
Bickel and Freedman (1981) and Efron and Tibshirani (1993, Chapter 7) catalogue the failure modes. Three to remember:
- Bounded estimators on bounded support. For , the MLE has a non-Gaussian extreme-value sampling distribution (§1.6). The bootstrap does worse than non-Gaussian: it puts mass on at most distinct values (any resample's max is the max of a random subset of original values, of which there are at most ), with the original sample max appearing in a fraction of bootstrap samples. So 63.2% of bootstrap samples have ; the bootstrap SE is artificially small. The widget makes this concrete — pick (max, Uniform, n = 30) and watch the failure.
- Extreme quantiles. Estimating the 99th percentile from : only one observation is "near" the 99th percentile, and bootstrap resamples almost never pick a value beyond the observed maximum. The bootstrap quantile is biased downward and underestimates uncertainty. Same root cause as the sample-max problem: the bootstrap can only redistribute observed values, not generate new tail extremes.
- Very small samples. The number of distinct nonparametric bootstrap samples of size from data points is . For this is 126; for it is 92378; for it is approximately . At the bootstrap distribution is lumpy and the asymptotic guarantees do not bite. Rule of thumb: prefer for the bootstrap, with larger for skewed populations.
The unifying diagnosis: the bootstrap inherits the regularity of . If is a Lipschitz-continuous functional of in the supremum norm, the bootstrap consistent estimator of its sampling distribution exists (Bickel-Freedman 1981). If is discontinuous as a function of the CDF — like , which only depends on the upper tail — the bootstrap fails by exactly the amount that the discontinuity damages.
The jackknife — bootstrap's deterministic ancestor
The jackknife predates the bootstrap by two decades. Quenouille (1949) introduced delete-one estimates for bias reduction; Tukey (1958) extended the idea to SE estimation, coining the name "jackknife" by analogy to a Boy Scout pocket tool. The algorithm:
- For , compute the leave-one-out estimator . Exactly replicates.
- Jackknife mean: .
- Jackknife SE: .
- Jackknife bias estimate (Quenouille): .
The factor in the SE formula is there for a reason. The jackknife replicates are NEARLY IDENTICAL to each other (each is the estimator on of the same observations), so their raw SD massively underestimates the SE of . The correction (close to 1 for large ; not negligible at small ) rescales them to track the actual SE. The derivation: for the sample mean, , so ; multiplying by gives , which is exactly . The jackknife recovers for the mean — but only with the scaling.
Compared to the bootstrap, the jackknife is:
- Cheaper. Exactly replicates, not . For expensive (a 10-minute MCMC fit, a large linear regression), the difference matters.
- Deterministic. Two analysts running the jackknife on the same sample get exactly the same SE. Two analysts running the bootstrap get answers that differ by Monte Carlo noise.
- Less general. The jackknife is consistent for smooth estimators (mean, variance, smooth M-estimators) but INCONSISTENT for non-smooth statistics like the sample median. For the median the jackknife "delete-one" estimates only take two distinct values for samples of odd size — the two interior order statistics — and miss the median's sampling variability entirely. The bootstrap captures it correctly. (Efron 1979 §6 demonstrates this; Efron and Tibshirani 1993 Chapter 11 has the full analysis.)
- Equivalent in the smooth limit. For smooth statistics, the jackknife is the linear approximation to the bootstrap. Efron and Tibshirani (1993) show that the two SE estimates agree to leading order in .
The second widget runs both schemes on the same small sample so you can compare directly.
Things to verify:
- Sample mean, any population, n = 20: jackknife SE ≈ bootstrap SE within ±10%. The smooth-statistic equivalence.
- Sample median, Normal, n = 20: jackknife SE is too small. The jackknife dots cluster on two values (the two middle order statistics); the bootstrap median histogram is much wider. The jackknife is the wrong tool here — Efron 1979 §6.
- 10% trimmed mean, Lognormal, n = 30: both SEs agree, and both are noticeably smaller than the raw mean's SE on the same data. Robust estimators get robust SE estimates from both schemes.
- Sample variance, Mixture (90% N(0,1) + 10% N(0,9)), n = 50: the mixture has a fat 4th moment; both schemes report a much larger SE than the pure-Normal formula would predict. Resampling adapts to the data; closed forms do not.
Three named extensions: parametric, wild, block
The basic nonparametric bootstrap above assumes the observations are IID from an unknown . Three classical extensions adapt it to other structures.
Parametric bootstrap. Fit a parametric model to the data (MLE for the parameter ), then resample from instead of . For each bootstrap sample, refit the model and recompute the statistic. Useful when a parametric model is plausible and you want to test fit, propagate parameter uncertainty through a downstream prediction, or compute SEs for likelihood-based estimators. The parametric bootstrap is the version of -plug-in where is the fitted parametric CDF rather than the empirical CDF.
Wild bootstrap. Wu (1986) for regression. Fix the covariates and the fitted residuals . For each bootstrap sample, multiply each residual by an INDEPENDENT zero-mean random variable (typically with equal probability, or the Mammen 1993 two-point distribution), and construct . This preserves the heteroscedasticity pattern of the original residuals while breaking their possible correlation with the covariates. The wild bootstrap is the standard resampling tool for heteroscedastic linear regression and a building block in modern econometrics (MacKinnon and Webb 2017 surveys recent developments).
Block bootstrap. Künsch (1989) for stationary time series. The naive nonparametric bootstrap destroys serial dependence by sampling individual time points independently. The block bootstrap fixes this by resampling overlapping or non-overlapping blocks of consecutive observations of length , preserving local dependence within each block. The block length is a tuning parameter: too short and you still break long-range structure; too long and you have too few independent blocks to resample. Politis and Romano (1994) introduced the stationary block bootstrap, which randomises to preserve stationarity exactly. Block bootstraps are the standard for SE and CI estimation on stationary economic, climatic, and seismic time series.
There are other variants (smoothed bootstrap, double bootstrap for refined CIs, m-out-of-n bootstrap for non-regular cases like the sample max), but the parametric, wild, and block forms are the three you should know by name. Part 8 §8.1 covers them with practical examples.
What the bootstrap CANNOT fix
The bootstrap is the most useful general-purpose inferential tool in modern statistics. It is also not magic. Four things it does not do:
- It does not fix sampling bias. If your sample is biased (§1.3 Berkson, Heckman, survivorship), the empirical CDF reflects the BIASED population, not the target population. The bootstrap then estimates the sampling distribution of under the biased sampling scheme — which is precisely the wrong quantity. Pre-bootstrap fixes (reweighting, propensity scores, MAR-based imputation) are needed BEFORE the bootstrap can do its job.
- It does not solve underidentification. If your regression has collinear predictors, the OLS coefficient is ill-defined; bootstrap replicates inherit the same ill-definition and produce SEs that may look reasonable but reflect a degenerate problem. The bootstrap respects the estimator's definition exactly — it does not improve on a fundamentally broken estimator.
- It does not fix non-regular statistics. Sample max from Uniform, extreme quantiles, MLE of for very small samples near zero: the bootstrap inherits the irregularity. Diagnose the estimator first; choose a resampling scheme that matches (or switch to a different estimator).
- It does not give you n. The bootstrap distribution's spread depends on the original sample size . If is too small to identify the parameter, no amount of will help. The bootstrap is plug-in, not data fabrication.
What the bootstrap DOES do — robustly, at moderate , for smooth statistics — is replace closed-form SE machinery with an empirical Monte Carlo that adapts to whatever shape the sampling distribution actually has. That is the win, and §1.7's point is to make both the win and its limits sharp.
Where §1.7 fits in Part 1
The bootstrap connects to every earlier section of Part 1:
- §1.1 defined bias, variance, and consistency abstractly; the bootstrap gives an empirical bias estimate () and an empirical variance estimate (the bootstrap variance) for ANY estimator, no closed form required.
- §1.2 (MoM) and §1.3 (MLE) produced estimators; the bootstrap gives their SEs.
- §1.4's CRLB gives the ASYMPTOTIC lower bound on variance. The bootstrap gives a FINITE-SAMPLE estimate that you can compare to the CRLB at any given . The CRLB is a benchmark; the bootstrap is the measurement.
- §1.5's bias-variance trade-off becomes operational once you can estimate both bias and variance from data. The bootstrap is the engine.
- §1.6's plug-in SE is a special case of the parametric bootstrap (with the parametric model being the trivial "fix one moment").
- §1.8 (robust estimators), §1.9 (delta method) build on top of bootstrap-SE machinery. The bootstrap handles non-smooth, non-Gaussian, finite-sample cases that the analytical tools cannot.
Part 3 §3.2 builds bootstrap CIs (percentile, basic, BCa) on top of this section's machinery. Part 8 §8.1 deepens the theory with parametric vs nonparametric variants and the m-out-of-n bootstrap. The widget pair here is the minimum viable foundation; everything downstream is on top of it.
Try it
- In the bootstrap simulator, pick (Exponential, mean, n = 30). Press "Re-run bootstrap" five times. Watch the bootstrap SE wobble by a few percent around the same value, and the ratio bootstrap/true SE stay near 1. This is Monte Carlo noise inside a working bootstrap.
- Same widget, switch to (Uniform, max, n = 30). The blue (bootstrap) histogram is now pinned to a few discrete bars on the right edge; the orange (oracle) is smooth. The ratio panel reports the failure; the verdict box explains why. This is Bickel-Freedman 1981 made visible.
- (Lognormal, variance, n = 30): both histograms are right-skewed. Ratio close to 1. Bootstrap correctly captures Lognormal's fat 4th moment without ever knowing the population.
- (Cauchy, mean, n = 50): both histograms have heavy tails. The bootstrap correctly mirrors the CLT failure. The estimator is broken; the bootstrap honestly reports it.
- Slide B from 100 to 5000 on a working case (Normal, median, n = 50). The bootstrap distribution smooths out; the SE estimate stabilises. B controls Monte Carlo error within the bootstrap; it does not change the bootstrap's asymptotic behaviour. To shrink the bootstrap-vs-truth gap you need to grow n, not B.
- In the jackknife vs. bootstrap widget, pick (Normal, mean, n = 20). Jackknife and bootstrap SE agree within 10%. The two schemes are doing the same job for a smooth statistic.
- Switch the statistic to median, same n = 20. The jackknife dots collapse onto two values; the jackknife SE is implausibly small compared to the bootstrap. This is the inconsistency Efron 1979 §6 flags. Use the bootstrap for quantile-based statistics.
- Pick the mixture population (90% N(0,1) + 10% N(0,9)) and the variance. Heavy-tail contamination shows up in both SE estimates, but is invisible in the naive Gaussian formula . Resampling adapts; closed forms do not.
- Pen-and-paper: derive that for the sample mean , the jackknife SE formula reduces to exactly (using ). Three lines, using .
- Pen-and-paper: the probability that observation is OMITTED from a single nonparametric bootstrap sample of size is . So each bootstrap sample contains, on average, about DISTINCT original observations. This is why the bootstrap-of-max fails: the original sample max appears in 63.2% of bootstrap samples (and is the bootstrap max in even more of them, since smaller observations rarely overtake it).
Pause and reflect: the bootstrap treats the empirical CDF AS IF it were the population. Glivenko-Cantelli says uniformly. So why is the bootstrap-of-max from Uniform(0, 1) wrong? Where does the uniform convergence run out, and what does that tell you about which functionals are "bootstrap-safe"?
What you now know
The empirical CDF is the natural data-driven substitute for the unknown population CDF . The plug-in principle estimates any functional by . The nonparametric bootstrap is the plug-in principle applied to the sampling distribution: resample with replacement from , recompute the estimator, summarise. Bootstrap SE is the SD of the resampled replicates; bootstrap bias is the difference between the mean of the replicates and the original estimate; bootstrap CIs (percentile, basic, BCa) are summaries of the bootstrap distribution's shape.
The bootstrap works for smooth functionals at moderate — sample means, medians, variances, smooth M-estimators, regression coefficients. It fails for bounded estimators on bounded support (sample max), for extreme quantiles relative to , and for very small samples. The jackknife is the deterministic, n-replicate, smooth-statistic special case; it agrees with the bootstrap for smooth estimators and is inconsistent for non-smooth ones (median, quantiles). Parametric, wild, and block bootstraps adapt the basic engine to fitted models, regression residuals, and time-series structure respectively. The bootstrap is not a fix for sampling bias, underidentification, or fundamentally broken estimators — it is plug-in, and inherits whatever virtues and defects the data and the estimator already carry.
Part 3 §3.2 will turn this engine into honest confidence intervals. Part 8 §8.1 will deepen the theory. §1.8 next replaces the mean with a robust estimator and discusses heavy-tailed populations where the bootstrap continues to work but the underlying statistic needs rethinking.
References
- Efron, B. (1979). "Bootstrap methods: another look at the jackknife." Annals of Statistics 7(1), 1-26. (The seminal bootstrap paper. Section 6 contains the median-jackknife inconsistency example.)
- Efron, B., Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall. (The standard textbook; Chapters 6-11 cover the basic bootstrap, jackknife, and the equivalence of the two for smooth statistics. Chapters 13-14 cover BCa.)
- Davison, A.C., Hinkley, D.V. (1997). Bootstrap Methods and Their Application. Cambridge University Press. (The other standard textbook; particularly thorough on bootstrap CIs and on the block bootstrap.)
- Tukey, J.W. (1958). "Bias and confidence in not quite large samples" (abstract). Annals of Mathematical Statistics 29, 614. (Tukey's naming of the jackknife and extension to SE estimation.)
- Quenouille, M.H. (1949). "Approximate tests of correlation in time series." Journal of the Royal Statistical Society, Series B 11(1), 68-84. (The original delete-one bias-reduction idea, predating Tukey 1958.)
- Bickel, P.J., Freedman, D.A. (1981). "Some asymptotic theory for the bootstrap." Annals of Statistics 9(6), 1196-1217. (Establishes bootstrap consistency for smooth functionals and exhibits the sample-max counterexample.)
- Wu, C.F.J. (1986). "Jackknife, bootstrap and other resampling methods in regression analysis" (with discussion). Annals of Statistics 14(4), 1261-1295. (Introduces the wild bootstrap for heteroscedastic regression.)
- Künsch, H.R. (1989). "The jackknife and the bootstrap for general stationary observations." Annals of Statistics 17(3), 1217-1241. (The block bootstrap for stationary time series.)
- Politis, D.N., Romano, J.P. (1994). "The stationary bootstrap." Journal of the American Statistical Association 89(428), 1303-1313. (The randomised-block-length variant that preserves stationarity.)
- Mammen, E. (1993). "Bootstrap and wild bootstrap for high dimensional linear models." Annals of Statistics 21(1), 255-285. (The two-point wild-bootstrap distribution and high-dimensional theory.)
- Efron, B. (1987). "Better bootstrap confidence intervals" (with discussion). Journal of the American Statistical Association 82(397), 171-200. (The BCa confidence-interval methodology.)
- Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer. (Chapter 8 covers the bootstrap operationally; cleaner derivations than the original papers.)
- Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. (Chapter 7 ties the bootstrap to model-assessment and prediction-error estimation.)
- MacKinnon, J.G., Webb, M.D. (2017). "Wild bootstrap inference for wildly different cluster sizes." Journal of Applied Econometrics 32(2), 233-254. (Modern wild-bootstrap practice in econometrics.)