Mixed-effects and hierarchical models intro
Learning objectives
- Recognise clustered data (repeated measures, multi-level groups) as requiring hierarchical models
- Add random intercepts to allow group-specific deviations from the population mean
- Distinguish FIXED effects (population-level coefficients of interest) from RANDOM effects (group-specific adjustments)
- Use lme4 / nlme / statsmodels.MixedLM for fitting
- Recognise when ignoring clustering inflates Type-I errors
So far Part 4 + §5.1-5.4 treated all observations as independent. Real data often violates this: students within schools, patients within hospitals, repeated measurements per subject, plants within plots. The grouped structure creates within-group correlation that ordinary GLMs ignore — inflating Type-I errors and giving incorrect SEs.
The random intercept model
For data grouped into J clusters with observations per cluster:
The is a cluster-specific random effect — each cluster gets its own intercept deviation from the population mean β_0. Variance components (between-cluster) and (within-cluster) are estimated.
Why this matters
Without the random effect, observations within a cluster are treated as independent. They're NOT: students within the same school share teachers, neighborhood, etc. The "effective sample size" is much less than the raw n. Ignoring clustering ⇒ SEs underestimated ⇒ Type-I error inflated (sometimes dramatically).
Random slopes
Beyond random intercepts: random slopes let the effect of a covariate vary by cluster. Example: in a study of teaching methods across schools, the EFFECT of the method might differ across schools. Model:
where joint Normal with covariance matrix Σ.
Fitting
Maximum likelihood (or REML). R: lme4::lmer for linear mixed; lme4::glmer for GLM mixed. Python: statsmodels.MixedLM; for non-Normal, glmer.PyMC via Bayesian methods.
When to switch from fixed-effects-only to mixed-effects
- Clear hierarchical structure (students-in-schools, patients-in-hospitals).
- Repeated measurements per subject.
- Suspected within-cluster correlation that biased SEs in non-mixed analysis.
Shrinkage — the key idea, visualised
Partial pooling (mixed-effects fit) sits between two extremes: COMPLETE POOLING (use one number for all clusters — ignores the structure) and NO POOLING (treat each cluster as independent — gives noisy estimates for small clusters). The mixed-effects BLUP estimate for cluster j is:
Small clusters (small ) get α_j near 0 — strong shrinkage TOWARD the grand mean. Large clusters get α_j near 1 — almost no shrinkage. The widget shows this graphically; toggle between complete / no-pool / partial-pool views and watch the cluster-specific estimates contract.
Try it
- Default settings (J = 16, n̄ = 8, σ_u = 1.5, σ_e = 2.0). Toggle between the three views. Notice in PARTIAL pooling: the small-n_j clusters move strongly toward the grand mean line (dashed); large-n_j ones barely budge. The dashed lines in the partial view trace exactly this contraction.
- Crank σ_u down to 0.2 (low between-cluster variance — i.e., the cluster means are nearly identical). Watch ALL partial-pool estimates collapse onto the grand mean. Complete pooling and partial pooling become indistinguishable.
- Now crank σ_u up to 5 (very different cluster means). Partial pool now barely shrinks at all — it's nearly identical to no-pool. The mixed-effects fit "knows" the clusters are genuinely different and trusts each within-cluster mean.
- Set σ_u = 1.5, σ_e = 5 (high within-cluster noise). Partial pool now shrinks aggressively — the within-cluster means are too noisy to trust without borrowing strength.
- Set n̄ = 3 (very small clusters). All α_j drop low, even with moderate σ_u. Compare MSEs: partial pool typically beats both extremes — sometimes by 2-3×.
Suppose you measure 8 hospitals: hospital A has 500 patients with mortality rate 4.2 %; hospital B has 5 patients with mortality 60 %. Which hospital should you believe more, and what is the mixed-effects answer's contribution toward improving the second estimate? Think about α_A vs α_B and what "borrowing strength" means here.
References
- Pinheiro, J.C., Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
- Gelman, A., Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. (Comprehensive applied treatment.)
- Bates, D., Maechler, M., Bolker, B., Walker, S. (2015). "Fitting linear mixed-effects models using lme4." J. Statistical Software 67(1), 1–48.
- McCulloch, C.E., Searle, S.R., Neuhaus, J.M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley.
- Henderson, C.R. (1975). "Best linear unbiased estimation and prediction under a selection model." Biometrics 31(2), 423–447. (The BLUP shrinkage formula.)