Capstone 2 — an observational study with confounding

Part 10 — Real-research capstones

Learning objectives

Specify a DAG distinguishing the causal pathway X → Y from the confounding pathway X ← Z → Y
Demonstrate that NAIVE OLS conflates β with γ·E[Z|X], producing biased estimates
Apply OLS adjustment, inverse-probability weighting (IPW), and propensity-score matching to recover β when Z is observed
Apply ROSENBAUM SENSITIVITY analysis (Γ) to bound how strong an unmeasured confounder would need to be to overturn the result
Recognise when IV / RDD / DiD identification strategies are required (unobserved confounders, regression-discontinuity, time-series natural experiment)

Most real-world questions are answered with OBSERVATIONAL data, not RCTs. The challenge: in observational data the treatment X is not randomly assigned, so naive comparisons of treated-vs-untreated outcomes conflate the causal effect (X → Y) with the confounding effect (X ← Z → Y). This capstone walks through the modern causal-inference toolkit: DAG specification, adjustment estimators, propensity scores, and sensitivity analysis.

Step 1 — DAG specification

Drawing the DAG (directed acyclic graph) makes the causal assumptions explicit. For our hypothetical study (training program X, income Y, confounded by prior education level Z):

Z \to X \quad \text{(selection: educated workers enrol more)}

Z \to Y \quad \text{(confounding: education raises income directly)}

X \to Y \quad \text{(the causal effect we want, magnitude } \beta\text{)}

The DAG encodes our assumptions about what could plausibly cause what. WITH a correctly specified DAG, we can identify the causal effect; without it, we are guessing.

Step 2 — The naive estimator and its bias

Naive OLS regresses Y on X, ignoring Z:

\hat{\beta}_{\text{naive}} = \frac{\text{Cov}(X, Y)}{\text{Var}(X)} = \beta + \gamma \cdot \frac{\text{Cov}(X, Z)}{\text{Var}(X)}.

The bias term $\gamma \cdot \text{Cov}(X,Z) / \text{Var}(X)$ is non-zero whenever Z confounds (γ ≠ 0) and Z affects X (Cov ≠ 0). The widget shows this bias dramatically: with γ = 1.5 and selection gap 0.5, the naive estimate is ~1.5 when the true β = 1.0 (50% inflation).

Step 3 — Adjustment estimators

If Z is observed, three estimators recover β (under the DAG assumption):

OLS adjustment: regress $Y \sim X + Z$ . The coefficient on X is the conditional-on-Z effect of X, which equals β.
Inverse-probability weighting (IPW): estimate the propensity $e(Z) = P(X=1 \mid Z)$ , then estimate ATE as $\hat{\mu}_1 - \hat{\mu}_0 = \frac{1}{N}\sum_i \frac{X_i Y_i}{e(Z_i)} - \frac{1}{N}\sum_i \frac{(1-X_i)Y_i}{1 - e(Z_i)}.$
Propensity-score matching: for each treated unit, find a control unit with similar e(Z) and difference their outcomes. Average to get the average treatment effect on the treated (ATT).

All three converge to β under: (a) correct DAG, (b) Z is observed, (c) positivity (e(Z) bounded away from 0 and 1).

Step 4 — Rosenbaum sensitivity (Γ)

After adjusting for the observed Z, residual concern: are there UNMEASURED confounders? Rosenbaum (1987) introduced the Γ statistic: how much would the odds of treatment for two units with the same observed Z need to differ (due to unmeasured Z_u) to overturn the conclusion?

For a binary outcome with matched-pair design: if Γ = 1, treatment assignment is essentially random within observed-Z strata (no unmeasured confounding). As Γ grows, the result becomes increasingly sensitive to unmeasured confounding. The Γ at which significance is lost is the SENSITIVITY BOUND.

Reported as: "Γ = 2.5 — the conclusion remains significant unless an unmeasured confounder DOUBLES the odds of treatment AND DOUBLES the odds of positive outcome simultaneously". Higher Γ = more robust to hidden confounding.

Step 5 — Alternative identification strategies

When adjustment cannot defensibly handle confounding (e.g., key confounders unmeasured), alternative identification:

Instrumental variables (IV): find a variable W that affects X but not Y directly (W → X → Y, no W → Y arrow). 2SLS estimates β through the IV.
Regression discontinuity (RDD): when treatment is assigned based on a threshold, compare just-above and just-below the threshold (quasi-randomisation).
Difference-in-differences (DiD): with pre/post data and treatment/control groups, the DiD estimator removes additive confounding via the parallel-trends assumption (cf. SDS §6.7).

Try it

Defaults: β = 1.0, γ = 1.5, selection gap = 0.5. The DAG shows three arrows; the histogram shows three sampling distributions across 200 simulated datasets. The naive estimator (red) is centred well above 1.0 (biased); the adjusted (green) and IPW (blue) are centred near 1.0 (unbiased).
Increase γ to 3.0. The confounding strength doubles. The naive estimator is now centred near ~2.5 (extreme bias). The adjusted and IPW remain near 1.0. This is the classic "the wrong analysis confidently reports the wrong answer".
Reduce selection gap to 0.0. The X and Z become independent (no selection). The naive and adjusted estimators converge — without selection there is no confounding bias even in OLS. This is the criterion for a "natural experiment" / random treatment assignment.
Increase N from 300 to 1000. All three sampling distributions tighten (lower variance); naive's bias does NOT shrink with N (it is asymptotic). Adjustment is consistent for β; naive is inconsistent.
Set β = 0.0. The true effect is zero. Naive may still show a "positive effect" entirely due to confounding. Adjusted correctly returns ~0. This is the canonical "spurious correlation" warning.
Click Resample (new seed). The 200 simulated datasets change; the qualitative finding holds — naive is biased, adjusted and IPW are not (when Z is observed).

A study reports that "people who eat dark chocolate have 30% lower cardiovascular disease". The authors adjusted for age, income, and exercise. What is the MOST LIKELY remaining confounder, and how would you report sensitivity analysis to address it?

What you now know

Observational data require careful causal-inference machinery. The DAG specifies the causal assumptions; adjustment estimators (OLS with confounders, IPW, propensity-score matching) recover the causal effect when the DAG is correct and Z is observed; Rosenbaum sensitivity analysis bounds the impact of unmeasured confounders. When adjustment cannot defensibly handle confounding, IV / RDD / DiD provide alternative identification. Communicate ALL assumptions explicitly — observational inference is only as good as the DAG you defend.

References

Hernán, M.A., Robins, J.M. (2020). Causal Inference: What If. Chapman & Hall/CRC. (Best modern textbook.)
Pearl, J. (2009). Causality, 2nd ed. Cambridge. (DAG framework.)
Imbens, G.W., Rubin, D.B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge.
Rosenbaum, P.R., Rubin, D.B. (1983). "The central role of the propensity score in observational studies for causal effects." Biometrika 70(1), 41–55. (Foundational propensity-score paper.)
Rosenbaum, P.R. (1987). "Sensitivity analysis for certain permutation inferences in matched observational studies." Biometrika 74(1), 13–26. (Γ sensitivity analysis.)