Potential outcomes and the fundamental problem

Part 6 — Causal inference for researchers

Learning objectives

State Rubin's potential-outcomes framework: each unit has Y(0) AND Y(1)
Identify the FUNDAMENTAL PROBLEM of causal inference: we observe only one potential outcome per unit
Define ATE, ATT, and CATE as expectations over potential outcomes
Articulate SUTVA (no interference + no hidden treatment variations) and recognise its empirical importance
Distinguish observed association E[Y|T=1] − E[Y|T=0] from the causal effect E[Y(1)] − E[Y(0)] formally
Recognise random assignment as the identifying device that turns association into causation

§4.8 closed Part 4 with the warning: regression is association, not causation. Part 6 takes that warning seriously and builds the language and identification strategies of modern causal inference. §6.1 starts with the foundational vocabulary — Rubin's potential-outcomes framework — and the fundamental problem that makes causal inference fundamentally HARDER than estimation: we can never see both possibilities for the same unit.

The potential-outcomes framework (Rubin 1974)

For each unit $i$ and treatment indicator $T_i \in {0, 1}$ , define TWO potential outcomes:

$Y_i(1)$ — the outcome unit $i$ would experience if TREATED.
$Y_i(0)$ — the outcome unit $i$ would experience if NOT treated (control).

Both quantities exist conceptually for every unit at every moment. Only one is ever observed. The INDIVIDUAL CAUSAL EFFECT is the difference:

\tau_i = Y_i(1) - Y_i(0).

The OBSERVED outcome is

Y_i = T_i \, Y_i(1) + (1 - T_i) \, Y_i(0).

If $T_i = 1$ , we see $Y_i(1)$ but $Y_i(0)$ is missing. If $T_i = 0$ , we see $Y_i(0)$ but $Y_i(1)$ is missing. The missing potential outcome is the COUNTERFACTUAL — what would have happened in the world that didn't occur.

The fundamental problem of causal inference

Holland (1986) named this the fundamental problem of causal inference:

For any single unit, exactly one of its two potential outcomes is observed. The other is forever missing. The individual causal effect $\tau_i = Y_i(1) - Y_i(0)$ is therefore unobservable.

This is not a small data-collection problem. It is structural. You cannot give a patient a drug AND simultaneously give them a placebo. You cannot have a country adopt a policy AND simultaneously NOT adopt it. The counterfactual world is, by definition, the world that didn't happen.

The whole of causal inference is the science of recovering AVERAGE effects across populations under assumptions about HOW the missing data is missing.

Three population-level causal estimands

Since individual effects are out of reach, we work with averages:

ATE — Average Treatment Effect: $\tau_{\text{ATE}} = E[Y_i(1) - Y_i(0)] = E[\tau_i]$ . The expected effect across the WHOLE population.
ATT — Average Treatment effect on the Treated: $\tau_{\text{ATT}} = E[Y_i(1) - Y_i(0) \mid T_i = 1]$ . The effect for those who actually receive treatment. Often different from ATE when treatment is selectively assigned.
CATE — Conditional Average Treatment Effect: $\tau(\mathbf{x}) = E[Y_i(1) - Y_i(0) \mid X_i = \mathbf{x}]$ . The effect within a subgroup defined by covariates. The target of HETEROGENEOUS treatment-effect estimation (Part 9 §9.6).

Different research questions point to different estimands. "Should the policy be rolled out to everyone?" — ATE. "Is the policy effective for those currently using it?" — ATT. "Which subgroups benefit most?" — CATE.

SUTVA: the Stable Unit Treatment Value Assumption

For the potential-outcomes framework to be well-defined, we need TWO assumptions bundled under SUTVA (Rubin 1980):

No interference between units. Unit $i$ 's potential outcomes depend only on $T_i$ , not on the treatments assigned to others. Violated by network effects (vaccination protecting unvaccinated neighbours), market equilibrium (a job-training program flooding the labour market), or contagion.
No hidden variations in treatment. There is ONE version of $T = 1$ . $Y_i(1)$ is a single value, not a distribution over different possible treatments. Violated when "treatment" is ambiguous (a vague policy intervention administered differently across sites).

SUTVA is rarely perfectly satisfied. The honest response: pre-specify what counts as a treatment, design the study to minimise spillover, and report sensitivity analyses for the assumption.

Association ≠ causation, formally

What we OBSERVE from a study:

E[Y_i \mid T_i = 1] - E[Y_i \mid T_i = 0] = E[Y_i(1) \mid T_i = 1] - E[Y_i(0) \mid T_i = 0].

What we WANT (for example, the ATE):

\tau_{\text{ATE}} = E[Y_i(1)] - E[Y_i(0)].

These differ by two conditioning operations. The first is over the marginal distribution of $Y(1)$ ; the second is over the distribution of $Y(1)$ AMONG THE TREATED only. They are equal if and only if the conditional distribution of $Y(1)$ given $T=1$ equals the marginal distribution of $Y(1)$ — equivalently, if $T$ is independent of $Y(1)$ (and similarly for $Y(0)$ ). When that independence holds, observed association equals ATE. When it doesn't, the difference is CONFOUNDING.

Random assignment as the identifying device

If a coin flip decides $T_i$ , then $T_i$ is INDEPENDENT of the unit's $(Y_i(0), Y_i(1))$ . The coin doesn't know what those values are. Under independence,

E[Y_i(1) \mid T_i = 1] = E[Y_i(1)] \quad \text{and} \quad E[Y_i(0) \mid T_i = 0] = E[Y_i(0)],

so the observed difference IS the ATE. This is the entire reason randomised controlled trials are the gold standard. §6.2 develops RCT design and analysis. §§6.3-6.8 develop strategies for the much harder observational-data case where T is decided by something other than a fair coin.

See it for yourself

The widget below shows the fundamental problem directly. Each unit has two potential outcomes; the widget reveals both in "cheat mode" but in practice only one is observable per unit. Compare the two treatment-assignment regimes — random vs self-selected — and watch the observed-association statistic diverge from the true ATE under self-selection.

Try it

Start in random assignment mode with N = 30. Toggle "show counterfactuals" on. Verify visually that some treated units would have been BETTER off without treatment (their Y(0) sits ABOVE their Y(1)) — heterogeneous treatment effects are real. Observed association closely tracks the slider true-ATE.
Set true ATE to 0. Hit "re-sample units" several times. Note that observed association is approximately 0 but NOT exactly 0 — sampling variability around the truth, which is the right kind of noise.
Switch to self-selected mode with selection strength 0.3. Re-sample. The observed association now SYSTEMATICALLY OVERSTATES the ATE. The bias is consistent across re-samples — not noise, but confounding.
Crank selection strength up to 1.5. Watch the bias balloon. The observed association can be 2-3× the true ATE — and in some seeds, observed reports a positive effect when true ATE is zero or negative.
Slide N from 30 to 200. Note that larger N tightens sampling variability in BOTH regimes but does NOT remove the bias under self-selection. Big data + confounding is still confounding; sample size cannot fix structural problems.

If a study compares mortality between hospitalised patients given an experimental drug vs not given it, and finds the drug-group had higher mortality, can you conclude the drug is harmful? Why or why not?

What you now know

Causal effects live in the space of potential outcomes Y(1) and Y(0); only one of these is ever observed per unit. The ATE, ATT, and CATE are different population averages of unit-level causal effects. SUTVA is the implicit assumption that lets us write the framework down. Random assignment makes treatment INDEPENDENT of potential outcomes, which is precisely what we need for observed association to equal the causal effect. The next eight sections of Part 6 take this foundation and build the practical machinery: RCTs (§6.2), DAG-based confounder adjustment (§6.3), propensity scores (§6.4), instrumental variables (§6.5), RDD (§6.6), DiD (§6.7), and sensitivity analysis (§6.8).

References

Rubin, D.B. (1974). "Estimating causal effects of treatments in randomized and nonrandomized studies." J. Educational Psychology 66(5), 688–701. (The foundational potential-outcomes paper for observational + randomised studies.)
Neyman, J. (1923, reprinted 1990). "On the application of probability theory to agricultural experiments." Statistical Science 5(4), 465–472. (Neyman's original potential-outcomes framework, in a Polish agricultural-experiments paper that anticipated Rubin's formalisation by 50 years.)
Holland, P.W. (1986). "Statistics and causal inference." JASA 81(396), 945–960. (The classic essay coining "fundamental problem of causal inference" and surveying the field's philosophical foundations.)
Imbens, G.W., Rubin, D.B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press. (The canonical modern textbook treatment of potential outcomes, SUTVA, and causal estimands.)
Hernán, M.A., Robins, J.M. (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. (Modern applied treatment, free PDF widely available, used in epidemiology / biostatistics graduate programs worldwide.)