Bootstrap CIs (percentile, BCa, basic)

Part 3 — Confidence intervals and uncertainty

Learning objectives

Recap from §1.7: the nonparametric bootstrap (Efron 1979) generates B replicates θ̂*_b by resampling WITH REPLACEMENT from the observed sample X_1, …, X_n, treating the empirical CDF F̂_n as a plug-in for the unknown F. The bootstrap distribution {θ̂*_b}_{b=1}^B is the empirical estimate of the sampling distribution of θ̂
Motivate bootstrap CIs as an answer to a Wald-CI limitation: Wald requires a closed-form SE and an asymptotic-Normal pivot. For functionals like medians, ratios, IQRs, correlation coefficients on bivariate data, and any statistic without a tidy parametric SE, the bootstrap gives an SE and CI BY RESAMPLING — no algebra required
Define the PERCENTILE bootstrap CI (Efron 1979): C^pct = (θ̂*_{α/2}, θ̂*_{1−α/2}), the empirical α/2 and 1−α/2 quantiles of the bootstrap distribution. State the regularity condition: percentile works when the bootstrap distribution of θ̂* − θ̂ is approximately TRANSLATION-INVARIANT about θ̂, i.e., symmetric and unbiased
Define the BASIC / PIVOTAL bootstrap CI: C^basic = (2θ̂ − θ̂*_{1−α/2}, 2θ̂ − θ̂*_{α/2}) — the reflection of the percentile CI about 2θ̂. Justify it as the inversion of the pivot W* = θ̂* − θ̂, treating W* as a draw from W = θ̂ − θ. Note: basic is better than percentile under bias, because the pivot is centred
Define the BCa (Bias-Corrected and accelerated) bootstrap CI (Efron 1987, JASA): adjusts the quantile probabilities α/2, 1−α/2 to α₁, α₂ via two corrections — (i) bias correction z̄_0 = Φ⁻¹(#{b : θ̂*_b ≤ θ̂} / B), measuring how off-centre the bootstrap distribution sits at θ̂; (ii) acceleration a, a jackknife-based estimate of the skewness of the sampling distribution. State BCa as the SECOND-ORDER ACCURATE default
State the BCa formulae: α₁ = Φ(z̄_0 + (z̄_0 + z_{α/2}) / (1 − a(z̄_0 + z_{α/2}))), α₂ = Φ(z̄_0 + (z̄_0 + z_{1−α/2}) / (1 − a(z̄_0 + z_{1−α/2}))). The CI is (θ̂*_{α₁}, θ̂*_{α₂}). When z̄_0 = 0 and a = 0, BCa reduces to percentile
State the STUDENTISED / bootstrap-t CI (Efron 1979, Hall 1988): pivot on (θ̂ − θ)/SÊ(θ̂). For each bootstrap replicate compute t*_b = (θ̂*_b − θ̂)/SÊ*_b, where SÊ*_b is a nested-bootstrap or jackknife SE of the b-th replicate. CI = (θ̂ − t*_{1−α/2}·SÊ, θ̂ − t*_{α/2}·SÊ). Most accurate; nested bootstrap is expensive
State the coverage-accuracy ladder (Hall 1988, 1992; DiCiccio-Efron 1996): percentile and basic CIs are FIRST-ORDER ACCURATE — coverage error is O(n⁻¹ᐟ²) in general (sometimes O(n⁻¹) for symmetric pivots). BCa and bootstrap-t are SECOND-ORDER ACCURATE — coverage error is O(n⁻¹), uniformly improving for skewed sampling distributions
Explain WHEN the four methods agree: symmetric, unbiased sampling distribution of θ̂ → percentile = basic = BCa to within Monte-Carlo noise. WHEN they differ: skewed sampling distribution + bias → BCa wins; severe skew + large bias → bootstrap-t wins, BCa is close second
State the practical B recommendation (Efron-Tibshirani 1993, §13; Davison-Hinkley 1997, §5.4): B ≥ 1000 for percentile/basic SE estimation, B ≥ 2000 for percentile/basic CI, B ≥ 10000 for BCa CIs (the tail quantiles need accurate empirical-CDF estimates), and bootstrap-t needs an inner M ≥ 50 (nested) on top of B
List the failure modes (Bickel-Freedman 1981, Efron-Tibshirani 1993): bootstrap CIs FAIL for non-smooth functionals — sample max/min, extreme quantiles, and parameters at the boundary of their space. The bootstrap distribution of max(X) is biased downward (cannot exceed observed max) and supported on at most n distinct values. Use OTHER methods (extreme-value theory, subsampling, m-out-of-n bootstrap) for those settings
Articulate the practical recommendation (Efron-Tibshirani 1993; DiCiccio-Efron 1996; Davison-Hinkley 1997): BCa as default — generally most accurate, automatic. Percentile as quick-and-dirty sanity check. Bootstrap-t when extra accuracy matters and computation is cheap. Basic / pivotal as the bias-corrected sibling of percentile when BCa is unavailable

Section §3.1 set out the CI framework: a procedure-level guarantee, the Wald/Wilson/Clopper-Pearson trio for the binomial, the Garwood gamma and the Student-t for the Poisson and Normal mean, the likelihood-ratio CI as the general-purpose finite-sample tool. Every CI in §3.1 leans on a known distributional form — the asymptotic Normality of the estimator (Wald), or the exact discrete sample-space sum (Clopper-Pearson), or the t-distribution (Student-t). When the estimator is a tidy MLE for a regular model, all of these tools are available. When the estimator is a SAMPLE MEDIAN, a RATIO OF MEANS, an IQR, a CORRELATION COEFFICIENT, a TRIMMED MEAN, or anything else without a closed-form SE, the §3.1 menu shrinks.

The BOOTSTRAP is the procedural sledgehammer for that gap. Efron (1979, Annals of Statistics 7(1), 1–26) introduced it as a way to estimate the sampling distribution of ANY statistic from the EMPIRICAL CDF F̂_n alone — no parametric model, no closed-form variance. Section §1.7 set up the machinery: B bootstrap replicates θ̂*_b are computed by resampling n observations WITH REPLACEMENT from the original sample and applying the statistic. The empirical distribution of {θ̂*_b} is the bootstrap distribution. Its standard deviation is the bootstrap SE; its mean minus θ̂ is the bootstrap bias estimate.

§3.2 turns those B replicates into CONFIDENCE INTERVALS. There are four standard recipes:

Percentile — empirical quantiles of the bootstrap distribution.
Basic / pivotal — reflection of percentile about 2θ̂.
BCa (Bias-Corrected and accelerated) — adjusts the quantile probabilities for bias and skewness.
Studentised bootstrap-t — pivots on (θ̂ − θ)/SÊ(θ̂); needs nested bootstrap for the per-replicate SE.

The arc has eleven stops. First, the bootstrap setup recap and what changes when we use the replicates for CIs instead of just an SE. Second, the percentile CI and its translation-invariance assumption. Third, the basic / pivotal CI and how it corrects for bias. Fourth, the BCa machinery in detail — z̄_0, the acceleration a from the jackknife, and the α-adjustment formulae. Fifth, the studentised bootstrap-t. Sixth, the coverage-accuracy ladder (Hall 1988, 1992): first-order vs second-order accurate. Seventh, the bootstrap-ci-methods widget: one sample, four CIs side by side. Eighth, B recommendations: the tail-quantile precision argument. Ninth, the bootstrap-coverage-comparison widget: empirical coverage by Monte-Carlo simulation. Tenth, the failure modes: non-smooth functionals, boundary parameters, extreme quantiles. Eleventh, the practical recommendation.

The bootstrap setup, in one paragraph

The nonparametric bootstrap, in operation, is unfussy. The reader has an observed sample $X = (X_1, \ldots, X_n)$ and a statistic $T(\cdot)$ ; the estimator is $\hat\theta = T(X)$ . The bootstrap procedure with $B$ replicates is:

For $b = 1, \ldots, B$ : draw $n$ indices $i_1, \ldots, i_n$ uniformly with replacement from ${1, \ldots, n}$ . Set $X^$ and compute $\hat\theta^$ _b = T(X^*_b) $\hat{θ}_{b}^{*} = T (X_{b}^{*})$ .
The empirical distribution of ${\hat\theta^*$ is the BOOTSTRAP DISTRIBUTION, the empirical estimate of the sampling distribution of $\hat\theta$ .
Confidence intervals are quantile-or-pivot operations on ${\hat\theta^*_b}$ .

What changed since §1.7? In §1.7 the bootstrap delivered an SE and a bias estimate. §3.2 uses those same $B$ replicates to deliver a CI. The PROCEDURE is the same — resample with replacement, recompute the statistic — but four different RECIPES turn the bootstrap distribution into an interval, and they disagree under skew. The choice of recipe matters for coverage.

What is the conceptual content? The bootstrap is a PLUG-IN: it replaces the unknown $F$ with $\hat F_n$ , then computes the sampling distribution of $T$ under $\hat F_n$ . By the Glivenko–Cantelli theorem (§0.5) $\hat F_n \to F$ uniformly as $n \to \infty$ , so the bootstrap sampling distribution converges to the true sampling distribution under regularity (Bickel–Freedman 1981). For finite $n$ the bootstrap is a finite-sample approximation, and the different CI recipes correspond to different ways of using that approximation.

The percentile CI — Efron 1979

The first and simplest bootstrap CI is the PERCENTILE CI. Sort the bootstrap replicates $\hat\theta^$ , pick off the empirical $\alpha/2$ and $1 - \alpha/2$ quantiles:

\boxed{\;C^{\mathrm{pct}}_{1 - \alpha} \;=\; \left(\hat\theta^*_{\alpha/2},\;\hat\theta^*_{1 - \alpha/2}\right)\;}

For $B = 2000$ and 95% nominal: the lower endpoint is the 50th value in the sorted bootstrap replicates and the upper is the 1950th. For continuous-ish $\hat F_n$ the empirical CDF interpolation is the standard choice (we use the linear-interpolation type-7 form in the widgets).

What makes percentile WORK? Suppose the bootstrap distribution of $\hat\theta^* - \hat\theta$ matches the sampling distribution of $\hat\theta - \theta$ in shape. If both distributions are SYMMETRIC and have the SAME MEAN (zero, in particular — unbiased), then the percentile recipe converts boundary quantiles of one into boundary quantiles of the other in a self-consistent way. Formally: percentile is exact when the bootstrap world is "translation-invariant" about $\hat\theta$ , i.e. when there exists a function $g$ such that the distribution of $g(\hat\theta)$ is symmetric about $g(\theta)$ . This is the percentile interval's implicit regularity condition.

What does percentile FAIL on? Any case where the bootstrap distribution is NOT translation-invariant about $\hat\theta$ . Two examples in textbook style:

Biased estimator. If $E[\hat\theta] \ne \theta$ , the bootstrap distribution centers on $\hat\theta$ (which is biased), not on $\theta$ . The percentile CI inherits that bias: the empirical quantiles capture replicate variability but not the systematic offset. The famous example is the sample correlation $\hat\rho$ from a bivariate Normal — biased toward 0, with bias on the order of $1/n$ . Percentile CIs for $\rho$ shift away from the truth in a way that BCa corrects (Efron 1987, §6).
Skewed sampling distribution. For variance estimators on skewed populations (lognormal, exponential), the sampling distribution of $s^2$ is right-skewed. The percentile CI uses symmetric $\alpha/2$ quantiles, which UNDER-allocate probability to the right tail and OVER-allocate to the left. The interval drifts left, missing the truth on the right side more often than the nominal $\alpha/2$ . Empirical coverage drops below nominal.

Efron (1979) acknowledged percentile as the simplest bootstrap CI in his original paper and as a first-cut answer for symmetric problems. The bigger projects of the 1980s — the BCa CI (Efron 1987), the studentised bootstrap (Hall 1988) — were direct responses to percentile's under-coverage under skew. Wasserman (2004, All of Statistics §8.3) and Efron-Tibshirani (1993, Chapters 13–14) walk through worked examples of each failure mode.

The basic / pivotal CI

The BASIC bootstrap CI fixes percentile's bias problem by pivoting on $W = \hat\theta - \theta$ . Treat the bootstrap replicates $\hat\theta^* - \hat\theta$ as draws from $W$ , and read off quantiles in those:

P\!\left(\hat\theta^*_{\alpha/2} - \hat\theta \;\le\; \hat\theta - \theta \;\le\; \hat\theta^*_{1-\alpha/2} - \hat\theta\right) \;\approx\; 1 - \alpha.

Solve for $\theta$ :

\boxed{\;C^{\mathrm{basic}}_{1 - \alpha} \;=\; \left(2\hat\theta - \hat\theta^*_{1 - \alpha/2},\;2\hat\theta - \hat\theta^*_{\alpha/2}\right)\;}

The basic CI is the REFLECTION of the percentile CI about $2\hat\theta$ . If the percentile CI is $(\hat\theta^$ , the basic CI is $(2\hat\theta - \hat\theta^$ . In symmetric, unbiased regimes the two coincide (because the bootstrap quantiles around $\hat\theta$ are symmetric). Under bias and skew they diverge: if the bootstrap distribution is RIGHT-SKEWED, percentile sits further right (its upper bound is from the long tail), while basic flips that pattern, sitting further LEFT.

The two intervals cannot both be right. Which one to use? Davison-Hinkley (1997, Bootstrap Methods and Their Application §5.2) recommend BASIC over percentile for biased estimators because the pivot $\hat\theta^* - \hat\theta$ corrects for the bias direction. Hall (1992) shows basic is no worse than percentile asymptotically and often a touch better under mild skew. The honest reading: basic is "percentile with a sign flip", neither obviously better nor worse, and in practice the recommendation is to use BCa instead of either one when it is available. The basic CI is in this section because the four-method comparison widget (and many software packages) report it as a sanity check; the conceptual content is that the pivot reformulation matters when there is a non-zero bias.

BCa: bias-corrected and accelerated (Efron 1987)

Efron (1987, JASA 82(397), 171–185) introduced BCa as the second-generation bootstrap CI: it corrects BOTH the centring problem and the skewness problem of the bootstrap distribution. The construction is the percentile CI with ADJUSTED quantile probabilities $\alpha_1, \alpha_2$ replacing $\alpha/2, 1 - \alpha/2$ :

C^{\mathrm{BCa}}_{1 - \alpha} \;=\; \left(\hat\theta^*_{\alpha_1},\;\hat\theta^*_{\alpha_2}\right)

The adjustment uses two estimated quantities:

Bias correction $\bar z_0$ — measures how off-centre the bootstrap distribution is at $\hat\theta$ . Defined as $\bar z_0 = \Phi^{-1}!\bigl(#{b : \hat\theta^*_b \le \hat\theta}/B\bigr)$ , the standard-Normal quantile of the empirical CDF of the bootstrap distribution evaluated at $\hat\theta$ . If exactly half of the bootstrap replicates fall below $\hat\theta$ (the bootstrap distribution is centred), then $\bar z_0 = 0$ . If the bootstrap distribution sits above $\hat\theta$ (replicates skew right of $\hat\theta$ ), the empirical CDF at $\hat\theta$ is less than 0.5 and $\bar z_0 < 0$ . The bias correction pulls the quantile probabilities AWAY from 0.5 in the same direction.
Acceleration $a$ — measures the SKEWNESS of the sampling distribution of $\hat\theta$ . Estimated via the jackknife: with $\hat\theta_{(i)}$ the leave-one-out replicate and $\bar\theta = \tfrac{1}{n}\sum_i \hat\theta_{(i)}$ ,

a = \dfrac{\sum_{i=1}^n (\bar\theta - \hat\theta_{(i)})^3}{6\,\bigl(\sum_{i=1}^n (\bar\theta - \hat\theta_{(i)})^2\bigr)^{3/2}}.

A symmetric sampling distribution gives $a \approx 0$ ; right-skewed gives $a > 0$ ; left-skewed $a < 0$ . The denominator is the $3/2$ -power of the jackknife variance, and the factor 6 calibrates the formula against the third-central-moment / Edgeworth expansion.

The adjusted probabilities are:

\alpha_1 \;=\; \Phi\!\left(\bar z_0 + \dfrac{\bar z_0 + z_{\alpha/2}}{1 - a\,(\bar z_0 + z_{\alpha/2})}\right), \qquad \alpha_2 \;=\; \Phi\!\left(\bar z_0 + \dfrac{\bar z_0 + z_{1 - \alpha/2}}{1 - a\,(\bar z_0 + z_{1 - \alpha/2})}\right).

The BCa CI is the $\alpha_1$ and $\alpha_2$ empirical quantiles of the bootstrap distribution.

Three sanity checks make the formula plausible. (1) When $\bar z_0 = 0$ and $a = 0$ , the $\alpha_1, \alpha_2$ formulae reduce to $\Phi(z_{\alpha/2}) = \alpha/2$ and $\Phi(z_{1-\alpha/2}) = 1 - \alpha/2$ — BCa collapses to percentile. (2) For symmetric $\Phi$ and small $\bar z_0, a$ , the corrections are linear: $\alpha_1 \approx \alpha/2 + 2\bar z_0 \phi(z_{\alpha/2}) + \ldots$ . The bias correction shifts both endpoints together; the acceleration shifts them by different amounts. (3) BCa is TRANSFORMATION-INVARIANT under monotone reparametrisations $g(\theta)$ : if you apply the bootstrap to $g(\hat\theta)$ and back-transform, you get the same CI. Percentile shares this property but lacks the bias and skew corrections; BCa is the canonical transformation-invariant, second-order-accurate bootstrap CI.

DiCiccio and Efron (1996, Statistical Science 11(3), 189–228) is the definitive paper on BCa's theoretical justification. They prove that BCa achieves O(n⁻¹) coverage error (second-order accurate) when the underlying model satisfies certain Edgeworth expansion conditions. Efron and Tibshirani (1993, An Introduction to the Bootstrap, Chapter 14) walk through worked examples. Modern software — R's boot::boot.ci(), Python's scipy.stats.bootstrap() — gives BCa as the default option when computationally feasible.

Studentised bootstrap-t

The bootstrap-t (Efron 1979 §6; Hall 1988 Annals of Statistics) pivots on the studentised statistic $T = (\hat\theta - \theta)/\widehat{\mathrm{SE}}(\hat\theta)$ , the same quantity that drives the §3.1 Wald CI. For each bootstrap replicate $b$ :

Compute $\hat\theta^*_b$ from the $b$ -th outer resample.
Compute $\widehat{\mathrm{SE}}^$ — the SE of $\hat\theta^$ _b $\hat{θ}_{b}^{*}$ . If a closed-form SE exists, plug in. Otherwise nest a second bootstrap with $M$ inner resamples of the $b$ -th outer resample and use the SD of those inner replicates as $\widehat{\mathrm{SE}}^*_b$ .
Form $t^$ .

The empirical distribution of ${t^*_b}$ is the bootstrap estimate of the sampling distribution of $T$ . The CI inverts the pivot:

\boxed{\;C^{\mathrm{boot-t}}_{1 - \alpha} \;=\; \left(\hat\theta - t^*_{1 - \alpha/2} \cdot \widehat{\mathrm{SE}},\;\hat\theta - t^*_{\alpha/2} \cdot \widehat{\mathrm{SE}}\right)\;}

where $\widehat{\mathrm{SE}}$ is the bootstrap SE of $\hat\theta$ (the SD of the outer $\hat\theta^$ 's). The bootstrap-t replaces the standard-Normal quantile $z$ {1 - \alpha/2} $z_{1 - α /2}$ from the §3.1 Wald CI with the empirical $t^$ _{1 - \alpha/2} $t_{1 - α /2}^{*}$ quantile.

Why is bootstrap-t the most accurate? Hall (1988) showed that the studentised pivot has THIRD-order accuracy in expansion when the inner SE estimate is consistent — coverage error is $O(n^{-3/2})$ rather than the BCa $O(n^{-1})$ . Empirically, on the bias-corrected examples of Efron-Tibshirani (1993, Table 14.4), bootstrap-t is closest to nominal coverage for skewed problems, with BCa a close second.

The catch is COMPUTATIONAL COST. Each outer $b$ requires an inner bootstrap of size $M$ . The total replicate count is $B \cdot M$ . For BCa at $B = 10000$ : 10000 statistic evaluations. For bootstrap-t at $B = 2000, M = 50$ : 100000. For complex statistics (model fits, eigenvalues of covariance matrices) the bootstrap-t is the bottleneck. The §3.2 widget offers it as opt-in for this reason; the default is BCa.

The coverage-accuracy ladder

The four bootstrap CIs sit on a coverage-accuracy LADDER, where each rung represents how fast the coverage error shrinks as $n \to \infty$ . The hierarchy (Hall 1988, 1992; DiCiccio-Efron 1996):

Method	Coverage error	Comments
Percentile	O(n⁻¹ᐟ²)	First-order accurate; symmetric cases are O(n⁻¹).
Basic / pivotal	O(n⁻¹ᐟ²)	First-order accurate; corrects bias direction.
BCa	O(n⁻¹)	Second-order accurate; transformation-invariant.
Bootstrap-t	O(n⁻¹)	Second-order; can be third-order with care.

The ladder is the analytic justification for the practical "BCa as default" recommendation. The first-order CIs (percentile, basic) lose a factor of $\sqrt n$ in coverage error compared to the second-order CIs (BCa, bootstrap-t). For $n = 100$ this is a factor of 10 in expected miscoverage; for $n = 25$ a factor of 5.

Two caveats. (1) The big-O bounds are asymptotic; small- $n$ behaviour can differ. Efron-Tibshirani (1993, §13) report simulation studies where the ranking holds at moderate $n$ but percentile is "good enough" for highly symmetric problems even at $n = 20$ . (2) The bounds depend on regularity — smooth statistics, Edgeworth expansions, finite moments. Non-smooth statistics (sample max, extreme quantiles) break the regularity and ALL four methods can fail. The failure-mode section below revisits this.

The first widget puts the four bootstrap CIs on a single number line so the reader can see how dramatically they can DISAGREE on the same data. Pick a population (Normal, Exponential, Lognormal, Chi-squared), a statistic (mean, median, variance, trimmed mean), the sample size $n$ , the bootstrap replicate count $B$ , and the confidence level. The widget draws ONE original sample, performs $B$ outer bootstrap replicates, and computes percentile, basic, BCa CIs. Bootstrap-t is opt-in (it triggers a nested $M = 50$ inner bootstrap and slows things down a few seconds).

Things to verify in the widget:

Start at Normal(0,1), mean, n = 50, B = 2000. Look at the four bars. Percentile, basic, and BCa should agree visibly — widths within 1-2% of each other, centres at the same place. This is the symmetric, unbiased regime where the four methods are interchangeable. Re-run a few times; the inter-method spread stays small. Empirical confirmation: percentile is fine when there is nothing to correct.
Switch to Lognormal(0,1) variance, n = 30. The variance estimator $s^2$ on a lognormal sample is HEAVY-TAILED and biased. Look at the bars now. Percentile and basic land in noticeably different places: percentile uses symmetric quantiles around the right-skewed bootstrap distribution, basic flips that bias. BCa lands BETWEEN them, leaning toward the truth. Re-roll the sample; the BCa CI is the most stable across re-rolls.
Switch to Exponential(1), median, n = 20. The median of an exponential has a known skewed sampling distribution. The four methods diverge but not as dramatically as variance-on-lognormal. Re-roll; the bootstrap distribution of the median is often LUMPY at small $n$ because the bootstrap can only return values present in the original sample. Increase $n$ to 100; the lumps smooth out and BCa converges closer to percentile.
Toggle ON bootstrap-t. Wait a few seconds. The bootstrap-t bar appears, typically tracking BCa closely on skewed problems and visibly wider than percentile on biased ones. Bootstrap-t is the most accurate but the slowest; for production work BCa is usually close enough.
Drop $B$ to 200. Re-roll the bootstrap a few times (same data). The CIs visibly WANDER — the empirical quantile estimates are Monte-Carlo noisy at low $B$ . BCa is the noisiest (its tails come from far-out quantiles). Crank $B$ to 10000; the wander tightens. This is the small- $B$ Monte-Carlo error you pay; the "use B ≥ 2000" rule of thumb is to push it below the procedure-level uncertainty.

How large does B need to be?

The bootstrap is a Monte-Carlo procedure. The CIs are estimated empirical quantiles, and quantile estimates have Monte-Carlo error of order $B^{-1/2}$ for fixed $\alpha$ . The exact recommendation depends on what you ask for:

Bootstrap SE only. $B \ge 200$ is usually enough — the SE is a SD over the full $B$ replicates, and SDs are stable estimators (Efron & Tibshirani 1993, §6).
Percentile or basic CI at 95%. $B \ge 1000$ for moderate accuracy; $B \ge 2000$ for low Monte-Carlo error. The $\alpha/2$ quantile estimate has SE $\sqrt{\alpha(1-\alpha)/(4 B f(q)^2)}$ where $f$ is the density at the quantile $q$ ; for smooth bootstrap distributions and $\alpha = 0.05$ , this is roughly $0.1\sigma/\sqrt B$ . With $B = 2000$ , the Monte-Carlo SE of each endpoint is around $0.02\sigma$ — comfortable.
BCa CI at 95%. $B \ge 10000$ . The BCa adjusts the quantile probabilities, often pushing them further into the tail than $\alpha/2 = 0.025$ . Tail quantiles are noisier per replicate than central quantiles. Efron-Tibshirani (1993, §14) recommend at least 10000; DiCiccio-Efron (1996) routinely use 50000 for theoretical studies.
Bootstrap-t at 95%. $B \ge 2000$ outer, $M \ge 50$ inner. Same per-quantile reasoning as percentile, plus the nested SE.
Confidence at 99% or above. Increase $B$ further. The 0.005 and 0.995 quantiles need around $5\times$ as many replicates as the 0.025 quantiles to keep the same Monte-Carlo SE.

The widget defaults $B$ to 2000; the upper limit on the slider is 10000. If you need more replicates than 10000, you should be running the bootstrap in a separate compute job, not in-browser.

The second widget moves from "one sample, four CIs" to "many samples, one CI per method, how often does each cover the truth?". This is the EMPIRICAL COVERAGE of a CI procedure — the right diagnostic for "is my 95% CI actually 95%?".

The simulation: for the chosen scenario (population, statistic, n, B, 1 − α), draw $R$ independent datasets. For each, perform $B$ bootstrap replicates, compute percentile, basic, BCa CIs, and check whether the TRUE parameter $\theta$ falls inside each. The empirical coverage of each method is the fraction of $r \in {1, \ldots, R}$ in which $\theta \in C_{\mathrm{method}}(X_{(r)})$ . The Monte-Carlo SE of the coverage estimate is $\sqrt{p(1-p)/R}$ where $p$ is the observed coverage; at 95% nominal and $R = 500$ this is about $\pm 1%$ .

Things to verify in the widget:

Start at Normal(0,1), mean, n = 20, B = 1000, R = 500, 95% nominal. Click Run. All three methods should land within $\pm 2 \text{ MC SE}$ of 95% — say, 93-97%. The methods AGREE because there is no skew or bias to correct. Read off the verdict column: all three should show "≈ nominal".
Switch to Lognormal(0,1), variance, n = 20, B = 2000, R = 500. Re-run. Percentile typically lands at 88-91% — UNDER-covers by 4-7 percentage points. Basic at 89-92%. BCa at 92-95%, often within MC SE of nominal. The verdict column flags percentile as "under-covers" and BCa as "≈ nominal". The widget visibly shows the BCa advantage.
Same widget, Exponential(1), mean, n = 20. Re-run. All three should be close to nominal — the exponential mean has a tight sampling distribution and percentile holds up. BCa might over-cover by 1-2% (the bias correction kicks in even when there is no actual bias, because the empirical CDF count is rarely exactly $B/2$ ).
Slide n down to 10 for the Lognormal-variance scenario. Re-run. The coverage gap between percentile and BCa WIDENS — percentile drops further, BCa drops a bit. The bootstrap distribution at $n = 10$ has only $\binom{19}{10} \approx 92000$ distinct possible bootstrap samples (the multiset count); at small $n$ the bootstrap distribution is lumpy and ALL methods degrade.
Slide n up to 100 for the same scenario. Re-run. All three methods converge toward nominal; the BCa advantage shrinks because the asymptotic ladder predicts both methods perform well at large $n$ . The ladder is large- $n$ asymptotic; for moderate $n$ the practical gap is what the widget shows.
Compare R = 100 vs R = 1000 on the same scenario. At R = 100 the empirical coverage estimates have SE ≈ 2-3%; at R = 1000 ≈ 0.7%. The Monte-Carlo noise is comparable to the inter-method coverage differences at low R, so re-running gives different rank orders. At R = 1000 the rank order stabilises and BCa's advantage becomes statistically convincing.

Where bootstrap CIs FAIL

The bootstrap is a powerful default but not a universal one. Four classical failure modes (Bickel-Freedman 1981, Efron-Tibshirani 1993, Hall 1992):

Sample max (and other extreme order statistics). If $X_i \sim \text{Uniform}(0, \theta)$ , the sample max $\hat\theta = X_{(n)}$ has a known sampling distribution ( $\theta \cdot \text{Beta}(n, 1)$ ). The bootstrap distribution of $X^*$ from one observed sample $X$ CANNOT exceed $X$ {(n)} $X_{(n)}$ (bootstrap resamples are subsets of observed values), so the bootstrap upper bound is forever pinned to the observed max. The bootstrap CI under-covers severely. Bickel-Freedman (1981, Annals of Statistics) gave this as the canonical bootstrap failure. The fix is the $m$ -out-of- $n$ bootstrap (resample $m \ll n$ observations) or subsampling.
Tail / extreme quantiles. For the empirical 95th percentile of a sample of size $n = 30$ , the bootstrap distribution is concentrated on the largest 1-2 order statistics — a few discrete points. The percentile, basic, and BCa CIs are all biased and unstable. Use distribution-free methods (Beran-Hall 1993) or asymptotic extreme-value theory for tail quantiles.
Parameter at a boundary. If $\theta$ sits at the boundary of its space (e.g., variance $\sigma^2 = 0$ , proportion $p = 0$ ), the sampling distribution is degenerate or one-sided, and the bootstrap distribution inherits that degeneracy. The percentile CI can collapse to a single point. BCa's acceleration estimate becomes unstable. Use exact CIs (Clopper-Pearson, Garwood) at boundaries; the §3.1 toolkit is the right place.
Heavy-tailed populations without finite moments. If $X_i \sim \text{Cauchy}(0, 1)$ , the sample mean has no SE — its sampling distribution stays Cauchy at every $n$ (the Cauchy is its own stable law). The bootstrap correctly reproduces this — the bootstrap CI is a Cauchy-tail interval — but the interval is wide and not "shrinkable" by increasing $n$ . The estimator itself is the problem, not the bootstrap.

The §1.7 bootstrap-simulator widget already made the max-on-uniform case interactively visible. The §3.2 widgets do not include the failure modes by default — they are textbook examples, not the typical regime — but the reader should know the bootstrap is not magic. When the regularity conditions hold (smooth functional, interior parameter, finite moments), BCa is a reliable second-order-accurate default. When they fail, the bootstrap fails and the §3.1 + §3.3 + §8 toolkit is the alternative.

Practical recommendation

The applied research workflow for a bootstrap CI:

Check for a closed-form alternative first. If the estimator is a Binomial proportion, Poisson rate, or Normal mean, use the §3.1 exact / score / Student-t CI. The bootstrap is needed only when the closed form is unavailable or known to be inaccurate.
BCa is the default. Efron-Tibshirani (1993), DiCiccio-Efron (1996), and Davison-Hinkley (1997) all agree: when the bootstrap is the right tool, BCa is the right BOOTSTRAP CI. Modern software (R's boot::boot.ci(), Python's scipy.stats.bootstrap()) defaults to it.
Use B ≥ 10000 for BCa, ≥ 2000 for percentile. The tail-quantile precision argument.
Bootstrap-t when extra accuracy matters. Theoretically third-order accurate. Practically slower, with nested SE estimation as the bottleneck. Worth it for high-stakes scientific work.
Report bootstrap CIs honestly. State the method (BCa vs percentile vs basic), $B$ , and the sampling distribution check (e.g., bootstrap-distribution histogram or jackknife-vs-bootstrap SE ratio from §1.7). Davison-Hinkley (1997, §5.7) walk through the reporting standards.
Sanity-check against §3.1 CIs. When a closed-form CI exists (binomial, Poisson, Normal-mean), compute BOTH and compare. Large disagreement is a red flag — either the closed form's assumptions are violated or the bootstrap is in a failure mode.

Try it

In the bootstrap-ci-methods, set Lognormal(0,1), variance, n = 30, B = 2000, 95% confidence. Re-roll the original sample five times. For each, note the four CIs (percentile / basic / BCa) and whether each covers the true variance $\sigma^2 = e\cdot(e-1) \approx 4.67$ . Tally the misses across the five re-rolls per method. Which method has the most "no" entries in the Covers-truth column? Connect this empirical count to the coverage-accuracy ladder.
Same widget. Switch to Normal(0,1), mean, n = 50. Re-roll a few times. The three CIs should be within 1-2% width of each other on every re-roll. Argue: in the absence of bias and skew, BCa is mathematically identical to percentile up to the $\bar z_0, a$ estimation noise. There is nothing for BCa to fix.
Same widget. Switch to Exponential(1), median, n = 20. Toggle ON bootstrap-t. Look at the four bars. The bootstrap-t and BCa typically agree closely; both lean slightly to the right of the percentile CI because the bootstrap median has a right-skewed sampling distribution. Argue: bootstrap-t and BCa both adjust for skewness — bootstrap-t by Studentising the pivot, BCa by the acceleration term. Either is a defensible answer here; percentile under-covers slightly.
Same widget. Pick Chi-squared(df = 3), mean, n = 20. Toggle bootstrap-t. With B = 5000 and the M = 50 inner replicates, the bootstrap-t computation takes a few seconds in-browser. Time it. Argue: bootstrap-t requires $B \cdot M = 250000$ statistic evaluations. For a more expensive statistic (an SVD, an MLE optimisation), that becomes prohibitive — which is why BCa is the practical default.
In the bootstrap-coverage-comparison, set Lognormal(0,1), variance, n = 20, B = 2000, R = 500, 95% confidence. Click Run. Record the empirical coverages. Re-run twice (the simSeed re-randomises each click). The verdict column should consistently flag percentile as under-covering by 3-7 percentage points and BCa as ≈ nominal. Argue: this is the canonical second-order-accuracy advantage of BCa over percentile under skew.
Same widget. Same Lognormal-variance scenario. Drop R to 100. Run. The MC SE is now about $\sqrt{0.95 \cdot 0.05/100} \approx 2.2%$ ; the 2-MC-SE band is $\pm 4.4%$ . Note that with R = 100 you cannot statistically distinguish percentile from BCa — the coverage estimates overlap inside the noise band. Argue: empirical-coverage studies need R ≥ 500 to detect a 3-5 percentage-point coverage gap with confidence.
Same widget. Crank n to 100 for the same scenario. Run. The BCa advantage should SHRINK — all three methods now hit close to nominal. Argue: the second-order vs first-order distinction in the coverage-accuracy ladder is asymptotic; at large $n$ both methods land in the $O(n^{-1})$ regime and the practical gap narrows.
Pen-and-paper. The BCa adjustment for $\bar z_0 = 0$ and $a = 0.05$ at 95% nominal. Compute $\alpha_1 = \Phi(0 + (0 + z_{0.025})/(1 - 0.05 \cdot (0 + z_{0.025})))$ with $z_{0.025} = -1.96$ . Numerically: denominator = $1 + 0.0980 = 1.0980$ ; numerator = $-1.96/1.0980 \approx -1.785$ ; $\alpha_1 = \Phi(-1.785) \approx 0.0371$ . So the BCa lower probability is 0.0371 vs the percentile 0.025 — BCa pushes the lower endpoint deeper into the right tail of $-1.96$ . Verify on the symmetric upper side: $z_{0.975} = +1.96$ , denominator = $1 - 0.0980 = 0.9020$ , numerator = $+1.96/0.9020 \approx +2.173$ , $\alpha_2 = \Phi(2.173) \approx 0.985$ . So the upper probability is 0.985 vs the percentile 0.975 — BCa pushes the upper endpoint even further into the right tail. With positive $a$ both endpoints shift right.
Pen-and-paper. Given the bootstrap distribution ${\hat\theta^*_b}$ has 1500 replicates ≤ $\hat\theta = 3.5$ out of $B = 2000$ . Compute $\bar z_0$ . Answer: $\bar z_0 = \Phi^{-1}(1500/2000) = \Phi^{-1}(0.75) \approx 0.674$ . Argue: the bootstrap distribution sits SIGNIFICANTLY above $\hat\theta$ (only 25% of replicates fall below $\hat\theta$ ) — this estimator is biased downward, and BCa pulls the $\alpha_1, \alpha_2$ both to the RIGHT to correct.
Pen-and-paper. State why the basic CI $(2\hat\theta - \hat\theta^$ can have an upper endpoint that is BELOW the lower endpoint of the percentile CI (or vice versa), if the bootstrap distribution is sufficiently asymmetric. Hint: if $\hat\theta^*$ {\alpha/2} $\hat{θ}_{α /2}^{*}$ and $\hat\theta^*_{1-\alpha/2}$ are not symmetric about $\hat\theta$ , the reflection about $2\hat\theta$ can land on either side. Argue why this means "percentile and basic disagree" is itself a diagnostic for non-symmetric bootstrap distributions.
Pen-and-paper. The §1.7 bootstrap-simulator widget showed that the bootstrap distribution of the sample max of Uniform(0,1) data is pinned to a few discrete values. Argue: for this estimator, percentile, basic, AND BCa CIs all under-cover because the bootstrap distribution's upper bound is the observed max, and the bootstrap quantiles are biased. Then read the §3.2 honest caveats list and identify which item describes this failure.
Pen-and-paper. State the coverage-accuracy ladder and the $O(n^{-1/2})$ vs $O(n^{-1})$ distinction. For $n = 100$ , what factor of accuracy improvement does BCa offer over percentile? (Answer: roughly $\sqrt{100} = 10\times$ .) For $n = 25$ ? (Answer: roughly $\sqrt{25} = 5\times$ .) Argue why this is asymptotic and may not perfectly describe finite- $n$ behaviour.

Pause and reflect: §3.2 has set out the four bootstrap CIs. PERCENTILE is the simplest and works under symmetry. BASIC corrects the bias direction via reflection about $2\hat\theta$ . BCa applies both a bias correction $\bar z_0$ and a jackknife-based acceleration $a$ and is the practical default. BOOTSTRAP-T pivots on the studentised statistic and is the most accurate, at the cost of nested computation. The COVERAGE-ACCURACY LADDER (Hall 1988, 1992) puts percentile/basic at first-order and BCa/bootstrap-t at second-order. B should be at least 2000 for percentile and 10000 for BCa. Bootstrap CIs FAIL on non-smooth functionals (max, extreme quantiles, boundary parameters); use exact §3.1 CIs or the §8.1 m-out-of-n bootstrap there. §3.3 picks up with PROFILE-LIKELIHOOD and LRT-based CIs — the parametric, likelihood-based alternative to the bootstrap.

What you now know

You can recap the bootstrap setup from §1.7 — $B$ resamples with replacement from $X_1, \ldots, X_n$ , statistic recomputed on each, empirical distribution ${\hat\theta^*_b}$ as the plug-in for the sampling distribution. You can articulate why the bootstrap is the right tool when the estimator has no closed-form SE — sample medians, ratios, IQRs, correlation coefficients, trimmed means.

You can state the four bootstrap CIs: PERCENTILE $(\hat\theta^$ ; BASIC $(2\hat\theta - \hat\theta^*$ {1-\alpha/2}, 2\hat\theta - \hat\theta^_{\alpha/2}) $(2 \hat{θ} - \hat{θ}_{1 - α /2}^{*}, 2 \hat{θ} - \hat{θ}_{α /2}^{*})$ ; BCa with bias correction $\bar z_0 = \Phi^{-1}(#{b : \hat\theta^$ b \le \hat\theta}/B) $zˉ0=Φ−1(#{b:θ^b∗≤θ^}/B)$ and acceleration $a = \sum (\bar\theta - \hat\theta$ {(i)})^3 / [6 (\sum (\bar\theta - \hat\theta_{(i)})^2)^{3/2}] $a = \sum (\overset{ˉ}{θ} - \hat{θ}_{(i)})^{3} / [6 (\sum (\overset{ˉ}{θ} - \hat{θ}_{(i)})^{2})^{3/2}]$ ; and the studentised BOOTSTRAP-T with the pivot $(\hat\theta - \theta)/\widehat{\mathrm{SE}}$ and a nested SE estimate. You can derive the percentile recipe as the simplest and the basic recipe as the bias-corrected sibling.

You can state the COVERAGE-ACCURACY LADDER: percentile and basic are first-order accurate (coverage error $O(n^{-1/2})$ ), BCa and bootstrap-t are second-order accurate ( $O(n^{-1})$ ). You can quote Hall (1988, 1992) and DiCiccio-Efron (1996) as the theoretical basis. You know that BCa is transformation-invariant and is the default in modern software.

You can use the bootstrap-ci-methods widget to see four bootstrap CIs side by side on one sample, agreeing on symmetric problems and diverging under skew/bias. You can use the bootstrap-coverage-comparison widget to see empirical coverage by Monte-Carlo simulation across $R$ datasets, observing the BCa advantage under skew.

You can list the failure modes — sample max and extreme quantiles, parameters at boundaries, distributions without finite moments — and know to fall back on the §3.1 exact CIs or the §8.1 m-out-of-n bootstrap when the standard bootstrap breaks. You can state the practical recommendation: BCa as default, percentile as quick check, bootstrap-t when extra accuracy matters.

Where this lands in the rest of Part 3. §3.3 picks up with PROFILE-LIKELIHOOD and likelihood-ratio CIs — the parametric, likelihood-based alternative to the bootstrap; the multi-parameter generalisation of the §3.1 LRT CI. §3.4 distinguishes prediction intervals (uncertainty about a future observation) from confidence intervals (uncertainty about a parameter). §3.5 takes calibration seriously: when does a 95% bootstrap CI actually mean 95%, and how do we diagnose miscalibration via simulation? §3.6 closes Part 3 on the communication side: reporting uncertainty without lying.

References

Efron, B. (1979). "Bootstrap methods: Another look at the jackknife." Annals of Statistics 7(1), 1–26. (The original bootstrap paper. Introduces the percentile CI and the studentised bootstrap-t as the two principled bootstrap CI methods.)
Efron, B. (1987). "Better bootstrap confidence intervals." Journal of the American Statistical Association 82(397), 171–185. (The BCa paper. Introduces the bias correction $\bar z_0$ and the jackknife-based acceleration $a$ ; shows second-order accuracy.)
Efron, B., Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall. (The standard textbook. Chapters 12-14 develop the four bootstrap CIs and the coverage-accuracy ladder with extensive simulation results.)
Davison, A.C., Hinkley, D.V. (1997). Bootstrap Methods and Their Application. Cambridge University Press. (The other standard textbook; complementary to Efron-Tibshirani. Chapter 5 on the four CIs; §5.2 on basic vs percentile; §5.4 on BCa; §5.7 on reporting standards.)
DiCiccio, T.J., Efron, B. (1996). "Bootstrap confidence intervals." Statistical Science 11(3), 189–228. (The definitive review. Theoretical justification of BCa's second-order accuracy via Edgeworth expansion; comparison with bootstrap-t and ABC variants.)
Hall, P. (1988). "Theoretical comparison of bootstrap confidence intervals." Annals of Statistics 16(3), 927–953. (The coverage-accuracy ladder. First-order vs second-order accuracy of each bootstrap CI, with rigorous Edgeworth-expansion proofs.)
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer. (Book-length treatment of bootstrap accuracy. The reference for the formal coverage-error bounds.)
Bickel, P.J., Freedman, D.A. (1981). "Some asymptotic theory for the bootstrap." Annals of Statistics 9(6), 1196–1217. (The canonical bootstrap-failure paper. Shows the bootstrap fails for the sample maximum of a continuous distribution with bounded support and characterises the conditions under which the bootstrap is consistent.)
Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer. (Chapter 8 on the bootstrap, including the percentile and pivotal CIs as default reporting forms.)
Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. (§8.2 on the bootstrap in modelling contexts. Practical recommendation for B and the BCa default.)