Randomised controlled trials, designed right
Learning objectives
- Apply random assignment to break the dependence between T and (Y(0), Y(1))
- Use blocked (stratified) randomisation to guarantee covariate balance
- Estimate the ATE as a simple difference in means and report its SE
- Apply Lin (2013) regression adjustment for variance reduction without sacrificing unbiasedness
- Pre-register the primary analysis to prevent specification searches
- Identify common RCT pathologies: non-compliance, attrition, contamination, Hawthorne effect, outcome switching
The randomised controlled trial is the gold standard for causal inference. Random assignment severs the dependence between treatment and unmeasured confounders — the property that turns observed association into the ATE. But the gold standard is not the easy standard. Real RCTs face design choices, finite-sample randomness, and a roster of pathologies that can wreck a trial that gets the randomisation right. §6.2 covers the design and analysis honestly.
Why randomisation identifies the ATE
If treatment is assigned by a coin flip (or any fair mechanism independent of unit characteristics), then is statistically independent of . Under independence:
The observed difference in means equals the ATE up to sampling noise. No modelling assumption about confounders. No back-door criterion to satisfy. The coin flip does the work.
The simple ATE estimator
For treated units and controls:
Unbiased estimator of ATE. Standard error from two-sample variance formula or its bootstrap analogue. For a single binary outcome, this is the proportion test from §2.3.
Block (stratified) randomisation
Pure simple randomisation can give imbalanced groups by chance — especially with small N. If 60% of the treated arm but 40% of the control arm happens to be female, the male/female imbalance adds noise (and creates suspicion of "manipulated randomisation" even when it's honest).
BLOCKED randomisation: stratify pre-randomly on key covariates (sex, age band, baseline severity), then randomise WITHIN each stratum. Guarantees balance on the blocked variables. Standard in modern RCTs — you almost never see "unstratified" randomisation in a high-quality trial.
Lin's (2013) regression adjustment
For variance reduction, regress the outcome on the treatment AND pre-treatment covariates, including treatment × covariate interactions:
The coefficient is the regression-adjusted ATE estimator. Under randomisation, it's still UNBIASED regardless of how good the covariate model is. The variance is at-or-better-than the simple difference of means. Robust to model mis-specification — this is Lin's key contribution (in response to Freedman's 2008 critique of mis-specified regression in RCTs).
Pre-registration
The biggest threat to RCT credibility is post-hoc analysis: deciding the primary outcome AFTER seeing the data. Pre-registration locks the analysis plan BEFORE looking at results — primary outcome, statistical test, multiple-comparison correction, subgroup analyses, missing-data handling. Major journals + most funding bodies now require it.
Common RCT pathologies
- Non-compliance: treated subjects don't take the treatment; controls cross over and obtain it. Intent-to-treat (ITT) analysis — analyse units according to ASSIGNED treatment, regardless of compliance — preserves randomisation's benefits but underestimates the per-protocol effect. To recover the per-protocol effect: use the IV machinery of §6.5 with randomised assignment as instrument.
- Attrition: outcomes missing for some units. If missing AT RANDOM (independent of potential outcomes), no bias. If non-random (dropout related to outcome), bias.
- Contamination: control units inadvertently receive treatment via spillover. Use cluster RCTs to isolate.
- Hawthorne effect: knowing you're in a study changes behaviour. Mitigated by blinding (single-blind: subjects don't know; double-blind: subjects AND researchers don't know).
- Outcome switching: changing the primary outcome after seeing data. Prevented by pre-registration.
Try it
- Start with N = 40, simple randomisation, true ATE = 5. Hit re-randomise several times. Note that stratum-A fractions in the two arms DRIFT — some seeds give 0.65 / 0.35 imbalance, others give 0.45 / 0.55. The imbalance ranges widely in small N.
- Switch to block randomisation. Re-randomise several times. Stratum-A fractions are now EXACTLY 0.5 / 0.5 in both arms (within rounding). Blocking eliminates chance imbalance entirely.
- With block randomisation and N = 40, the ATE estimate is closer to the true ATE on average than under simple randomisation — the imbalance noise was costing accuracy. Cycle through re-randomisations and confirm.
- Crank N to 200 under simple randomisation. The imbalance shrinks (large-sample law of large numbers). The benefit of blocking is greatest in small samples.
- Set true ATE = 0. Under simple randomisation at N = 20, observed effect can be ±3 just from imbalance. Under blocked: closer to 0. Without blocking and with small N, you risk reporting a spurious effect — exactly why blocking matters for high-stakes trials.
A drug trial randomises 40 patients but gets unlucky: 14 of the 20 treated are male, only 6 of 20 controls are male. Sex affects baseline outcome heavily. Would Lin (2013) regression adjustment INCLUDING sex as a covariate fix this, or do you need to re-randomise?
What you now know
The RCT is the canonical causal-inference design. Randomisation breaks the link between treatment and confounders; the simple difference-in-means is an unbiased ATE estimator. Blocked randomisation guarantees covariate balance and reduces variance, especially in small samples. Lin (2013) regression adjustment further reduces variance without sacrificing unbiasedness. Pre-registration is the standard guard against specification searches. Non-compliance, attrition, contamination, and outcome switching are the practical pathologies — each has a known mitigation. §6.3 turns to OBSERVATIONAL data, where the coin flip isn't available and identification requires assumptions about confounders.
References
- Fisher, R.A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. (The foundational RCT-analysis text; introduced randomisation as the basis of inference.)
- Lin, W. (2013). "Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique." Annals of Applied Statistics 7(1), 295–318. (The modern justification for covariate-adjusted RCT analysis.)
- Athey, S., Imbens, G.W. (2017). "The econometrics of randomized experiments." In Handbook of Economic Field Experiments Vol. 1, Elsevier. (Comprehensive modern survey of RCT design and analysis.)
- Imbens, G.W., Rubin, D.B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press. (Chapters 6–7 develop the RCT analysis machinery with full Neyman-Rubin formalism.)
- Cox, D.R. (1958). Planning of Experiments. New York: Wiley. (Classic treatise on experimental design including blocking, randomisation schemes, factorial designs.)