Sensitivity analysis: bounding what unobserved confounding could do
Learning objectives
- State the central question of sensitivity analysis: how strong an UNOBSERVED confounder is needed to nullify a finding
- Compute and interpret the ROBUSTNESS VALUE for an OLS coefficient (Cinelli & Hazlett 2020)
- Compute and interpret the E-VALUE for a risk-ratio estimate (VanderWeele & Ding 2017)
- Apply ROSENBAUM BOUNDS (Γ) in matched-pair observational studies
- Distinguish ASSUMPTION-FREE bounds (Manski) from ASSUMPTION-PARAMETRIC bounds (Rosenbaum, E-value)
- Recognise sensitivity analysis as the INVERSE PROBLEM to confounding-bias regression
Every observational design in §§6.3–6.7 (DAGs, propensity scores, IV, RDD, DiD) leans on an UNTESTABLE identifying assumption: no unobserved confounding; instrument exogeneity; no manipulation at the cutoff; parallel trends. The analyst can argue these assumptions are credible, but cannot prove them from data alone. SENSITIVITY ANALYSIS confronts this honestly by asking the inverse question: how strong an unmeasured confounder would have to be to overturn the conclusion? A single number — the robustness value or the E-value — lets the reader assess whether that strength is plausible given their substantive knowledge of the domain.
The inverse problem
Forward problem (standard regression): given a confounder of known strength, what bias does it create? Inverse problem (sensitivity analysis): given an observed estimate, what unmeasured confounder strength would explain it away? The forward problem requires knowing the confounder; the inverse problem does not. Sensitivity analysis transmutes an unknowable thing (how much hidden confounding is there really?) into a quantifiable one (how much would there have to be to matter?).
The robustness value (Cinelli & Hazlett 2020)
Consider the X-adjusted regression of Y on T, giving coefficient . Suppose an unmeasured confounder U has equal-strength coupling to both T and Y. In standardized units, the omitted-variable-bias formula gives
The ROBUSTNESS VALUE (RV) is the smallest equal-strength that drives the adjusted coefficient to zero:
Interpretation: RV is the minimum coupling strength on BOTH legs (U → T and U → Y, in standardized scale) for an unmeasured confounder to fully explain the observed effect. Compare RV to the strength of observed covariates. If RV is much larger than the strongest measured confounder, the finding is robust; an unmeasured confounder would need to be stronger than anything we already control for — an implausibly large effect.
Cinelli & Hazlett package this as the "extreme robustness value" XRV and a 2D contour plot in -space. The R Cran package sensemakr and Python PySensemakr implement both. Modern best practice in applied econometrics.
The E-value (VanderWeele & Ding 2017)
For RISK-RATIO estimates (e.g., epidemiological studies), VanderWeele & Ding (2017) introduced the analogous notion. Given an observed risk ratio ,
The E-value is the minimum strength of association on the risk-ratio scale (on BOTH legs U → T and U → Y) that would suffice to explain away the observed RR. Reported alongside every modern epidemiological RR estimate. Quick examples: an observed RR of 1.5 has E-value — quite plausible for a strong unmeasured confounder. An observed RR of 5.0 has E-value — few real-world unmeasured confounders are that strong. The E-value is also reported for the lower CI bound; if even the lower CI's E-value is comfortable, the finding is robust.
Connection to RV: the E-value is the multiplicative-scale analogue of the additive-scale RV. Same intellectual content, different scale.
Rosenbaum bounds for matched studies (Rosenbaum 1987, 2002)
The classical sensitivity framework for matched-pair observational designs. Within each matched pair, parameterize a "hidden bias" Γ as the maximum odds ratio by which two units in the pair could differ in their treatment probability due to UNOBSERVED covariates. Γ = 1 means treatment is essentially randomized within pairs; Γ > 1 leaves room for hidden bias of bounded strength.
The Rosenbaum bound: report the worst-case p-value or treatment effect attainable under EACH plausible value of Γ. The first Γ at which the p-value crosses your significance threshold is the BREAKEVEN Γ. If the breakeven is Γ > 2 (meaning hidden bias would have to double the within-pair treatment odds to invalidate the finding), the finding is robust to "moderate" hidden bias.
Examples: cigarette smoking and lung cancer is robust to Γ up to ~6 — an unmeasured confounder would have to multiply within-pair smoking odds 6× to explain the association. Implausibly large — one of the strongest sensitivity arguments in epidemiology. The R package sensitivitymv and Stata rbounds implement Rosenbaum.
Manski bounds: assumption-free, sometimes uninformative
Manski (1990, 2003) developed bounds without any unmeasured-confounding assumption at all. The bounds come from the logical constraints of potential outcomes alone. Often these bounds are wide — potentially uninformative for policy. The strength of Manski bounds: they make NO assumption about the missing counterfactual. The weakness: they typically don't pin down a sign.
Manski + a partial-identification assumption (e.g., monotone treatment selection) often gives bounds tight enough to be informative. Useful when the analyst is unwilling to make strong identifying assumptions and prefers a defensible range.
Comparing the frameworks
- RV / Cinelli-Hazlett (2020): for OLS coefficients. Linear-regression-friendly. Modern default for econometrics.
- E-value (VanderWeele-Ding 2017): for risk ratios. Multiplicative-scale interpretation. Modern default for epidemiology.
- Rosenbaum bounds: for matched-pair designs. Established tool in observational epidemiology and program evaluation.
- Manski bounds: assumption-free, partial-identification. Useful when the analyst wants a defensible-without-strong-assumptions range.
All four are answering the same question (what would it take to overturn the finding?) but each suits a different inferential regime.
How to report sensitivity
- Compute the RV (or E-value, or breakeven Γ) for the point estimate AND for the lower bound of the 95% CI.
- Benchmark RV against the strength of the strongest OBSERVED confounder in the model. If RV exceeds even the strongest observed confounder's coupling strength, the finding is robust.
- Discuss what unmeasured confounders might plausibly have such strength. Provide substantive examples.
- Be explicit if the design is fragile — do not bury the analysis.
Try it
- Start with τ = 1.5, X coupling 0.7, γ_true = 0.0. The X-adjusted estimate β_obs ≈ 1.5 (no hidden confounding biases it). The RV is √1.5 ≈ 1.22 — a hypothetical U would need standardized strength > 1.22 on BOTH legs to nullify. The blue robustness curve crosses zero at γ ≈ 1.22.
- Crank γ_true to 0.5. Now the X-adjusted β_obs is biased up to roughly 1.5 + 0.5² ≈ 1.75. The RV computed from β_obs is √1.75 ≈ 1.32. The green γ_true line still sits well left of the red RV line — finding remains "robust" against this hidden bias. The analyst, looking only at β_obs, would say "an unobserved U would need γ > 1.32 to nullify" without realising one is silently present at γ = 0.5.
- Crank γ_true to 1.3, just under √1.5 ≈ 1.22. Wait — γ_true = 1.3 means β_obs is now roughly 1.5 + 1.69 ≈ 3.19. The RV from this inflated β_obs is √3.19 ≈ 1.79. The γ_true line at 1.3 is LEFT of the RV at 1.79 — the analyst would still say "robust", but the entire β_obs is mostly confounder bias. The point: RV cannot detect confounding that is ALREADY THERE; it can only quantify how strong an ADDITIONAL or HYPOTHETICAL confounder would need to be relative to the OBSERVED estimate.
- Set τ = 0 with γ_true = 0.8. The analyst's β_obs ≈ 0 + 0.64 = 0.64 — a spurious positive effect. The RV ≈ √0.64 = 0.80 = γ_true exactly. The reader (who knows γ_true) sees the green and red lines coincide: this is the exact threshold case where a confounder of strength γ_true could indeed have generated all of β_obs.
- The takeaway: sensitivity analysis is most credible when (a) RV is large compared to observed-covariate effect sizes, AND (b) the analyst has done DAG-level reasoning to enumerate which U variables could plausibly exist. Without (b), RV is a clean number with no interpretive grip.
A study reports an OLS coefficient β_obs = 0.16 with 95% CI (0.04, 0.28). The robustness value for the point estimate is RV = 0.40 and the RV for the lower CI bound is 0.20. What practical conclusion follows?
What you now know
Sensitivity analysis answers the inverse problem: how strong would unmeasured confounding have to be to overturn the finding? The robustness value (Cinelli-Hazlett 2020) gives the answer on the OLS-coefficient scale; the E-value (VanderWeele-Ding 2017) on the risk-ratio scale; Rosenbaum bounds (1987, 2002) on the matched-pair odds-ratio scale; Manski bounds (1990) make NO assumption at all and give an interval. Each frameworks answers the same question in the inferential regime where the underlying study lives. Compute, report, and BENCHMARK against observed-covariate strength. This concludes Part 6's tour of causal inference. Parts 7 (Bayesian methods), 8 (resampling), and 9 (ML for researchers) extend the analyst's toolkit beyond the classical causal toolbox.
References
- Rosenbaum, P.R. (2002). Observational Studies (2nd ed.). Springer. (Definitive treatment of Γ bounds.)
- Cinelli, C., Hazlett, C. (2020). "Making sense of sensitivity: Extending omitted variable bias." JRSS-B 82(1), 39–67. (Robustness value & sensemakr.)
- VanderWeele, T.J., Ding, P. (2017). "Sensitivity analysis in observational research: Introducing the E-value." Annals of Internal Medicine 167(4), 268–274.
- Manski, C.F. (2003). Partial Identification of Probability Distributions. Springer. (Assumption-free bounds.)
- Imbens, G.W. (2003). "Sensitivity to exogeneity assumptions in program evaluation." AER P&P 93(2), 126–132. (Imbens-style benchmark plots, precursor to Cinelli-Hazlett.)
- Cornfield, J., Haenszel, W., Hammond, E.C., et al. (1959). "Smoking and lung cancer: Recent evidence and a discussion of some questions." JNCI 22(1), 173–203. (The original implicit sensitivity argument.)