Regression discontinuity
Learning objectives
- Identify a RUNNING VARIABLE with a SHARP cutoff that deterministically assigns treatment
- Estimate the LATE at the cutoff via LOCAL-LINEAR regression on each side
- Choose BANDWIDTH via Imbens-Kalyanaraman or Calonico-Cattaneo-Titiunik methods
- Diagnose MANIPULATION of the running variable via the McCrary (2008) density test
- Distinguish SHARP from FUZZY RDD
Some treatments are assigned by a DETERMINISTIC RULE: scholarships above an SAT-score cutoff, antibiotics above a fever threshold, scholarship awards above a need-index cutoff. RDD exploits this: comparing units JUST ABOVE and JUST BELOW the cutoff is essentially comparing randomly-assigned-treatment units — they're indistinguishable except for treatment.
Sharp regression discontinuity
Assume a continuous running variable X and a cutoff such that
Treatment is a DETERMINISTIC function of X. The LATE at the cutoff is
The two limits are the conditional means of Y just above and just below the cutoff. If E[Y | X] is continuous at c except for the treatment-induced jump, the difference IS the causal effect of T at the boundary.
Local-linear regression at the cutoff
To estimate the limits, fit a LOCAL-LINEAR regression on each side of the cutoff within a BANDWIDTH . The intercepts of these fits at are the estimated limits; their difference is the LATE.
- Local-linear (not local-quadratic): Imbens-Kalyanaraman (2012) showed linear is optimal asymptotically — higher polynomials over-fit.
- Triangular or rectangular kernel: weight points within bandwidth by their distance from c. Triangular kernel is the modern default.
- Bandwidth h: small h reduces bias (less linearity assumption needed) but reduces sample size. Imbens-Kalyanaraman + Calonico-Cattaneo-Titiunik bandwidth-selection methods navigate this trade-off optimally.
Bandwidth selection
Two modern approaches:
- Imbens-Kalyanaraman (2012): MSE-optimal bandwidth, balances bias and variance.
- Calonico-Cattaneo-Titiunik (2014): ROBUST bias-corrected inference. Reports CIs that don't shrink to zero as h grows.
R: rdrobust package implements both.
Fuzzy RDD
When the cutoff only PROBABILISTICALLY assigns treatment (e.g., scholarship offers don't all become acceptances), the discontinuity at c shifts the TREATMENT PROBABILITY rather than determining treatment. Use IV machinery (§6.5): the cutoff is an instrument for T. Identifies LATE for the COMPLIERS at the cutoff.
The McCrary (2008) density test for manipulation
RDD assumes units just-above and just-below the cutoff are COMPARABLE. If units can MANIPULATE their X to be on the favourable side (bribing the SAT proctor, fudging the income statement), the density of X has a JUMP at c. Sorted units differ from non-sorted in unobserved ways — comparability fails.
McCrary (2008) provides a density-continuity test: estimate the density of X locally on each side of c; test for equality. Standard diagnostic in every RDD paper.
RDD's identification logic
The key thought experiment: a unit with X = c - ε is essentially identical to a unit with X = c + ε. Yet one gets treated, the other doesn't. Comparing their outcomes ISOLATES the treatment effect because everything else is comparable. The RDD "natural experiment" is genuine: at the cutoff, treatment is as-if-random.
Caveat: the LATE refers to units AT THE CUTOFF, not the broader population. RDD gives the effect for those near the threshold of treatment, which may differ from the population-average ATE. For policy: useful if you're considering moving the cutoff slightly; less useful for population-wide rollout questions.
Famous examples
- Thistlethwaite & Campbell (1960): National Merit Scholarship qualifying score as RDD → effect on college outcomes (the foundational paper).
- Hahn et al. (1999): Vote share margin in close elections as RDD → incumbent advantage.
- Lee (2008): 50% vote share threshold → re-election probability.
- Carrell-Sacerdote-West (2013): Cadet rank cutoffs in West Point assignments → effects of peer composition.
Try it
- Start with true τ = 1.5, bandwidth = 1.0, no manipulation. The estimated LATE recovers approximately +1.5. The vertical blue bar at c shows the gap clearly.
- Drop bandwidth to 0.3. Fewer points; the estimate becomes NOISIER (re-sample to see). Smaller bandwidth = lower bias but higher variance.
- Crank bandwidth to 2.5. More points but the linearity assumption stretches; the local-linear fit may MISS curvature in the underlying f(x), biasing the LATE estimate.
- Set manipulation rate to 70%. Watch the McCrary diagnostic alarm: density just-right is much higher than just-left. The LATE estimate becomes biased because the sorted units differ from non-sorted on unobservables.
- Set τ = 0 with no manipulation. The estimated LATE is approximately zero (recovers the truth). Now add manipulation: a spurious positive LATE appears even when truth is zero — manipulation as an alternative explanation for any RDD finding.
A study uses a per-capita-income cutoff to identify the effect of social-security eligibility on retirement decisions. Income is self-reported. Why is the McCrary density test critical here?
What you now know
RDD exploits a sharp cutoff in treatment assignment. The LATE at the cutoff is identified by comparing units just-above to just-below via local-linear regression. Bandwidth selection (Imbens-Kalyanaraman, Calonico-Cattaneo-Titiunik) navigates the bias-variance trade-off. McCrary's density test catches manipulation, the chief threat to identification. Fuzzy RDD generalises to probabilistic assignment via IV. §6.7 turns to DIFFERENCE-IN-DIFFERENCES, the canonical observational design for policy interventions introduced at a specific time to a specific group.
References
- Thistlethwaite, D.L., Campbell, D.T. (1960). "Regression-discontinuity analysis: An alternative to the ex post facto experiment." J. Educational Psychology 51(6), 309–317. (The foundational paper.)
- Imbens, G.W., Lemieux, T. (2008). "Regression discontinuity designs: A guide to practice." J. Econometrics 142(2), 615–635. (Comprehensive applied guide.)
- Calonico, S., Cattaneo, M.D., Titiunik, R. (2014). "Robust nonparametric confidence intervals for regression-discontinuity designs." Econometrica 82(6), 2295–2326.
- McCrary, J. (2008). "Manipulation of the running variable in the regression discontinuity design: A density test." J. Econometrics 142(2), 698–714.
- Lee, D.S., Lemieux, T. (2010). "Regression discontinuity designs in economics." J. Economic Literature 48(2), 281–355.