Difference-in-differences and synthetic controls

Part 6 — Causal inference for researchers

Learning objectives

State the difference-in-differences estimator as a 2×2 cell-mean calculation
Articulate the PARALLEL-TRENDS assumption and what it does/does not require
Recognise CARD-KRUEGER (1994) as the canonical applied DiD design
Express DiD as TWO-WAY FIXED-EFFECTS regression with unit and time FEs
Recognise WHY staggered treatment timing breaks vanilla two-way FE (Callaway-Sant'Anna 2021; Goodman-Bacon 2021)
Recognise SYNTHETIC CONTROL (Abadie et al. 2010) as the right tool when only ONE unit is treated

RDD (§6.6) handles assignments by a sharp cutoff. Many real policy interventions don't come with a clean cutoff: a minimum-wage hike applies state-wide to everyone at once; a smoking ban applies to a whole jurisdiction starting on a date. We have a TREATED group and a CONTROL group, BEFORE and AFTER. DIFFERENCE-IN-DIFFERENCES (DiD) is the canonical estimator for this 2×2 structure — and the most widely used quasi-experimental tool in applied economics. Its power comes from a clean trick; its weakness is one big assumption that has to be defended every time.

The DiD estimator as a 2×2 table

Index units by group $G \in {0, 1}$ (control vs treated) and time $t \in {0, 1}$ (before vs after). Let $\bar{Y}_{G,t}$ be the average outcome in cell (G, t). The DiD estimator is

\hat{\tau}_{\text{DiD}} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - (\bar{Y}_{0,1} - \bar{Y}_{0,0}).

The treated's before-after change MINUS the control's before-after change. The first difference removes any time-invariant level gap between groups (different baselines for treated vs control); the second difference removes any common time trend (everyone trended together for unrelated reasons). What's left is the part of the treated's change that exceeds the control's change — the causal effect, IF the assumption below holds.

The parallel-trends assumption

DiD identifies the average treatment effect on the treated (ATT) under PARALLEL TRENDS:

E[Y(0)_{1,1} - Y(0)_{1,0}] = E[Y(0)_{0,1} - Y(0)_{0,0}].

In words: had the treated unit not been treated, its outcome would have evolved in the same way as the control's. The control's pre-to-post change serves as the counterfactual for what the treated's pre-to-post change would have been ABSENT treatment. Crucially, this is a statement about COUNTERFACTUAL Y(0) trajectories — it cannot be tested directly. We test pre-period parallelism as a proxy: if treated and control moved together before treatment, hopefully they would have continued to do so absent treatment.

The assumption tolerates a CONSTANT level gap between groups (fine — the first difference removes it). It does NOT tolerate a SLOPE difference even in the pre-period — a pre-existing diverging trend will be projected forward as a spurious treatment effect.

Two-way fixed-effects regression

The 2×2 DiD is algebraically equivalent to the OLS regression

Y_{i,t} = \alpha_i + \delta_t + \tau \cdot D_{i,t} + \varepsilon_{i,t},

where $\alpha_i$ is a unit fixed effect, $\delta_t$ is a time fixed effect, and $D_{i,t} = 1$ if unit i is treated at time t. The unit FE absorbs any time-invariant unit-level heterogeneity; the time FE absorbs any common shock affecting all units. The coefficient $\tau$ on $D_{i,t}$ is the DiD estimate. Standard errors should be CLUSTERED at the unit level (Bertrand, Duflo, Mullainathan 2004) — pre/post observations of the same unit are not independent. Failing to cluster yields drastically anti-conservative standard errors and inflated rejection rates.

Card & Krueger (1994): the canonical applied example

New Jersey raised its minimum wage from $4.25 to$ 5.05 in April 1992. Standard economic theory predicted employment in low-wage industries would fall. Card & Krueger surveyed fast-food restaurants in NJ and in eastern Pennsylvania (the control, no minimum-wage change) before and after the policy. Their finding: employment in NJ did NOT fall relative to PA. The DiD estimate was approximately ZERO — and in many specifications, slightly POSITIVE. This sparked a 25-year empirical and theoretical reassessment of the minimum-wage literature. The paper's power came not from sophisticated econometrics but from the clean DiD design and a careful argument that PA's fast-food sector was a credible counterfactual for NJ's.

Diagnostics: pre-trend plots

Best practice: plot the treated and control series across MULTIPLE pre-periods (not just the one immediately before treatment). If pre-period trends are visibly parallel, the parallel-trends assumption is at least empirically credible for the pre-period. If treated and control diverge BEFORE treatment, the assumption fails — DiD will return a biased answer that includes the pre-existing divergence projected forward.

Formal tests: event-study regressions estimate group-specific effects at each pre-period offset (-3, -2, -1, post-1, post-2 ...). Pre-treatment coefficients should be statistically indistinguishable from zero. Significant pre-trend coefficients are a red flag. Caveat: tests of pre-trend parallelism have LOW power; absence of evidence is not evidence of absence. Roth (2022) shows that conditioning the DiD analysis on having passed pre-tests can introduce bias of its own.

Staggered treatment and the modern reckoning

Many applications have multiple treated groups treated at DIFFERENT TIMES (rolling state policy adoption, hospital programs phased in across years). The naive two-way-FE regression of Y on unit FE, time FE, and a "treated × post" indicator — long the workhorse — was shown in 2021 to be PATHOLOGICALLY BIASED when treatment effects vary across cohorts. The reason: under staggered adoption, units treated EARLIER serve as CONTROL for units treated LATER. The implicit comparison includes already-treated-vs-newly-treated contrasts, with negative weights on some cohorts. Bias can flip the sign of the average treatment effect.

Goodman-Bacon (2021) decomposed the two-way-FE estimator into all its 2×2 building blocks, exposing the negative-weight problem. Callaway & Sant'Anna (2021), Sun & Abraham (2021), de Chaisemartin & D'Haultfœuille (2020) — modern estimators that explicitly average treatment effects across cohort/period without negative weights. Best practice now: under staggered timing, do NOT use vanilla two-way FE; use a cohort-aware estimator.

Synthetic control: the one-treated-unit case

What if only ONE unit is treated (a single state passes a policy, a single firm receives a merger approval)? The treated-vs-control comparison no longer has noise from many treated units; we have N=1. Abadie, Diamond & Hainmueller (2010) proposed SYNTHETIC CONTROL: construct a weighted average of un-treated units (the "donor pool") whose PRE-TREATMENT outcome trajectory matches the treated unit's. The weights are chosen by quadratic optimisation to minimise pre-treatment mismatch. The post-treatment gap between the treated and the synthetic control is the estimated effect.

Famous applications: California Proposition 99 tobacco control (Abadie, Diamond, Hainmueller 2010); German reunification on GDP (Abadie, Diamond, Hainmueller 2015); Brexit referendum on UK GDP (Born et al. 2019). The method is now standard whenever DiD's "find me a comparable control" problem is severe.

Try it

Start with true τ = 1.5, pre-trend break = 0 (parallel). The DiD estimate recovers approximately +1.5; the pre-trend gap is near zero and the diagnostic shows ✓ parallel. The two coloured lines move together in the pre-period, then the green (treated) jumps at the treatment line.
Read the 2×2 cell table carefully. The "Diff" column shows the treated's and control's before-to-after changes. The bottom-right cell is the difference of those two — that single number IS the DiD estimate.
Compare the two naive estimators printed below the table. Treated-only before-after picks up the common time trend (the control's trend tells you this); post-only between-groups picks up the level gap (the pre-period gap tells you this). DiD cancels both.
Crank pre-trend break to +0.20. The treated line is already rising faster than the control before treatment. The pre-trend gap diagnostic alarms. The DiD estimate is now BIASED — it picks up the pre-existing divergence and credits it to treatment.
Set τ = 0 and pre-trend break = +0.20. The DiD estimate is a substantial positive number even though the true effect is zero. Pre-existing trend differences alone create a spurious DiD finding.
Reset and crank noise to 2.0 with τ = 1.5. The DiD estimate remains roughly unbiased but visibly noisier across re-samples. Noise affects PRECISION; pre-trend violations affect BIAS — fundamentally different problems.

A policy analyst estimates a state-level tax credit's effect on small-business employment via DiD: treated states (states adopting the credit) vs control states (states that didn't). She reports the DiD = +0.8% and concludes the credit boosted employment. What three questions would you want answered before believing this?

What you now know

DiD compares the treated's before-to-after change to the control's before-to-after change. The parallel-trends assumption — that the treated would have followed the control's trend absent treatment — is the identifying assumption. Card & Krueger (1994) gave the design its reputation; clustered standard errors (Bertrand-Duflo-Mullainathan 2004) gave it its reliability. Under STAGGERED treatment timing across many cohorts, vanilla two-way fixed effects break (Goodman-Bacon 2021), and modern cohort-aware estimators (Callaway-Sant'Anna 2021) are required. Synthetic control (Abadie-Diamond-Hainmueller 2010) extends DiD logic to N=1 settings. §6.8 turns to SENSITIVITY ANALYSIS — bounding observational findings by quantifying how strong unmeasured confounders would have to be to nullify them.

References

Card, D., Krueger, A.B. (1994). "Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania." American Economic Review 84(4), 772–793. (The canonical applied DiD.)
Bertrand, M., Duflo, E., Mullainathan, S. (2004). "How much should we trust differences-in-differences estimates?" QJE 119(1), 249–275. (The clustered-SE paper.)
Abadie, A., Diamond, A., Hainmueller, J. (2010). "Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco-control program." JASA 105(490), 493–505.
Goodman-Bacon, A. (2021). "Difference-in-differences with variation in treatment timing." J. Econometrics 225(2), 254–277. (The decomposition paper.)
Callaway, B., Sant'Anna, P.H.C. (2021). "Difference-in-differences with multiple time periods." J. Econometrics 225(2), 200–230. (Modern cohort-aware estimator.)
Roth, J. (2022). "Pre-test with caution: Event-study estimates after testing for parallel trends." AER: Insights 4(3), 305–322.