Causal forests and double ML
Learning objectives
- State the DOUBLE/DEBIASED ML framework (Chernozhukov et al. 2017)
- Recognise cross-fitting as the essential ingredient that combines ML flexibility with valid inference
- Apply DOUBLE ML to partially-linear models: τ from ML nuisances + linear residual regression
- Introduce CAUSAL FORESTS (Athey-Wager 2019) for heterogeneous treatment effects
- Recognise modern applied econometrics tools: doubleml, EconML, grf
The chapter has covered ML for prediction (§§9.1–9.5). The natural question: can ML also help with causal inference? In high-dimensional settings (10+ confounders), classical OLS adjustment requires strong functional-form assumptions about how X affects Y. Modern hybrid methods — DOUBLE ML (Chernozhukov et al. 2017) and CAUSAL FORESTS (Athey-Wager 2019) — bring ML's flexibility to causal inference WITHOUT sacrificing valid inference.
The setup
Consider the partially linear model:
where is the treatment, the (homogeneous) causal effect, a (possibly high-dim) vector of confounders, and are unknown smooth functions. Classical approaches use parametric specifications of g (e.g., linear in X). When X is high-dim or the relationships are nonlinear, this fails.
The double ML idea
The key observation (Robinson 1988, made modern by Chernozhukov et al. 2017): if we know g and m, we can FRISCH-WAUGH the system:
Equivalently: take residuals of Y on X and T on X; regress Y-residuals on T-residuals; the slope IS . The trick: ESTIMATE and via ANY ML method (random forest, gradient boosting, neural net, lasso), then compute residuals, then linear regression.
Cross-fitting: the essential ingredient
Naively plugging in ML estimates and to the residuals introduces BIAS — the ML estimates depend on the same data used in the residual regression. Chernozhukov et al. solve this via CROSS-FITTING:
- Split data into K folds.
- For each fold, train and on the OTHER K-1 folds.
- Use to compute residuals on this held-out fold.
- Pool all out-of-fold residuals, do the final linear regression.
The cross-fitting separates the data used for nuisance estimation from the data used for the final regression — eliminating the bias. NEYMAN ORTHOGONALITY of the moment condition ensures that small ML errors in don't damage the asymptotic distribution of . Result: VALID CIs with no further inflation, even when the nuisances are estimated by black-box ML.
The big result
Theorem (Chernozhukov et al. 2017, simplified): if and converge at rate (slower than the parametric ), then is -consistent and asymptotically Normal with valid CIs computed by standard formula. ML's flexibility + classical inference's rigor.
Causal forests (Athey-Wager 2019)
For HETEROGENEOUS treatment effects that vary across individuals, single estimators of average τ are insufficient. CAUSAL FORESTS extend random forests:
- Build trees whose splits maximise treatment-effect heterogeneity (not classification or regression).
- Each tree estimates a local treatment effect within its leaves.
- Average across the forest to estimate .
- Variance can be estimated via cross-fitting; valid CIs available.
Causal forests are now the standard tool for heterogeneous treatment effects in applied econometrics. R package grf (Athey, Wager, Stefan); Python EconML and causalml.
Other modern causal ML methods
- R-Learner (Nie-Wager 2021): generalises DML; consistent under flexible models.
- X-Learner (Künzel et al. 2019): designed for highly imbalanced treatment groups.
- TMLE (Targeted Maximum Likelihood Estimation; van der Laan): semi-parametric efficient estimation with ML nuisances.
- BART (Bayesian additive regression trees): tree-based Bayesian nonparametrics that naturally handle causal inference (e.g., bartcause R package).
What this DOESN'T solve
Modern causal ML still requires the IDENTIFICATION assumptions of §6:
- No unobserved confounding: X must include ALL common causes. Double ML cannot fix the absence of an unmeasured confounder.
- Positivity: P(T=1|X) bounded away from 0 and 1.
- Stable treatment effects: SUTVA — no spillover between units.
What modern ML CAN do: relax the FUNCTIONAL FORM assumption. You no longer need to commit to a particular parametric form for g(X). What ML CANNOT do: substitute for unconfoundedness.
Try it
- Default: τ = 1.0, N = 400. The naive OLS (Y ~ T only) is heavily biased — confounding by X dominates. Full OLS (Y ~ T + X) works in this LINEAR setting; it's the gold standard when functional form is known. DML recovers τ similarly with valid CI.
- Drag N up to 2000. Both OLS and DML converge to true τ with shrinking CIs. DML's CI shrinks at √n rate (Neyman orthogonality).
- Re-sample many times. The naive OLS is consistently biased; DML CIs cover the true τ ~95% of the time (valid frequentist coverage). The DML CI is the inferential statement.
- Set true τ = 0. DML estimates near zero with CI including 0 — correctly fails to detect a non-existent effect. Naive OLS still shows non-zero bias (the confounding).
- The widget uses ridge as the nuisance estimator (a simple ML model). In real applications, use random forest, gradient boosting, or neural networks for richer functional forms.
An economist has 50 confounders X, suspects nonlinear and interacted effects of X on Y, and wants to estimate the average treatment effect of T on Y. What modern approach is appropriate, and why isn't classical OLS sufficient?
What you now know
Double ML uses cross-fitted ML estimates of nuisance functions to compute debiased treatment effects with valid CIs. Causal forests extend random forests to heterogeneous treatment effects. Modern tools: grf, doubleml, EconML, BART. All still require identification assumptions (unconfoundedness, positivity, SUTVA). §9.7 next: reporting an ML result so the reader can trust it.
References
- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2017). "Double/debiased machine learning for treatment and structural parameters." The Econometrics Journal 21(1), C1–C68.
- Wager, S., Athey, S. (2018). "Estimation and inference of heterogeneous treatment effects using random forests." JASA 113(523), 1228–1242.
- Athey, S., Wager, S. (2019). "Estimating treatment effects with causal forests: An application." Observational Studies 5, 37–51.
- Robinson, P.M. (1988). "Root-N-consistent semiparametric regression." Econometrica 56(4), 931–954.
- Nie, X., Wager, S. (2021). "Quasi-oracle estimation of heterogeneous treatment effects." Biometrika 108(2), 299–319.