Mixed-effects and hierarchical models intro

Part 5 — Generalised linear models

Learning objectives

Recognise clustered data (repeated measures, multi-level groups) as requiring hierarchical models
Add random intercepts to allow group-specific deviations from the population mean
Distinguish FIXED effects (population-level coefficients of interest) from RANDOM effects (group-specific adjustments)
Use lme4 / nlme / statsmodels.MixedLM for fitting
Recognise when ignoring clustering inflates Type-I errors

So far Part 4 + §5.1-5.4 treated all observations as independent. Real data often violates this: students within schools, patients within hospitals, repeated measurements per subject, plants within plots. The grouped structure creates within-group correlation that ordinary GLMs ignore — inflating Type-I errors and giving incorrect SEs.

The random intercept model

For data grouped into J clusters with $n_j$ observations per cluster:

Y_{ij} = \beta_0 + u_j + \beta_1 x_{ij} + \varepsilon_{ij}, \quad u_j \sim N(0, \sigma^2_u), \quad \varepsilon_{ij} \sim N(0, \sigma^2_e).

The $u_j$ is a cluster-specific random effect — each cluster gets its own intercept deviation from the population mean β_0. Variance components $\sigma^2_u$ (between-cluster) and $\sigma^2_e$ (within-cluster) are estimated.

Why this matters

Without the random effect, observations within a cluster are treated as independent. They're NOT: students within the same school share teachers, neighborhood, etc. The "effective sample size" is much less than the raw n. Ignoring clustering ⇒ SEs underestimated ⇒ Type-I error inflated (sometimes dramatically).

Random slopes

Beyond random intercepts: random slopes let the effect of a covariate vary by cluster. Example: in a study of teaching methods across schools, the EFFECT of the method might differ across schools. Model:

Y_{ij} = \beta_0 + u_j + (\beta_1 + v_j) x_{ij} + \varepsilon_{ij},

where $(u_j, v_j) \sim N(0, \Sigma)$ joint Normal with covariance matrix Σ.

Fitting

Maximum likelihood (or REML). R: lme4::lmer for linear mixed; lme4::glmer for GLM mixed. Python: statsmodels.MixedLM; for non-Normal, glmer.PyMC via Bayesian methods.

When to switch from fixed-effects-only to mixed-effects

Clear hierarchical structure (students-in-schools, patients-in-hospitals).
Repeated measurements per subject.
Suspected within-cluster correlation that biased SEs in non-mixed analysis.

Shrinkage — the key idea, visualised

Partial pooling (mixed-effects fit) sits between two extremes: COMPLETE POOLING (use one number for all clusters — ignores the structure) and NO POOLING (treat each cluster as independent — gives noisy estimates for small clusters). The mixed-effects BLUP estimate for cluster j is:

\hat{\mu}_j = \bar{y} + \alpha_j (\bar{y}_j - \bar{y}), \qquad \alpha_j = \frac{n_j \sigma^2_u}{n_j \sigma^2_u + \sigma^2_e}.

Small clusters (small $n_j$ ) get α_j near 0 — strong shrinkage TOWARD the grand mean. Large clusters get α_j near 1 — almost no shrinkage. The widget shows this graphically; toggle between complete / no-pool / partial-pool views and watch the cluster-specific estimates contract.

Try it

Default settings (J = 16, n̄ = 8, σ_u = 1.5, σ_e = 2.0). Toggle between the three views. Notice in PARTIAL pooling: the small-n_j clusters move strongly toward the grand mean line (dashed); large-n_j ones barely budge. The dashed lines in the partial view trace exactly this contraction.
Crank σ_u down to 0.2 (low between-cluster variance — i.e., the cluster means are nearly identical). Watch ALL partial-pool estimates collapse onto the grand mean. Complete pooling and partial pooling become indistinguishable.
Now crank σ_u up to 5 (very different cluster means). Partial pool now barely shrinks at all — it's nearly identical to no-pool. The mixed-effects fit "knows" the clusters are genuinely different and trusts each within-cluster mean.
Set σ_u = 1.5, σ_e = 5 (high within-cluster noise). Partial pool now shrinks aggressively — the within-cluster means are too noisy to trust without borrowing strength.
Set n̄ = 3 (very small clusters). All α_j drop low, even with moderate σ_u. Compare MSEs: partial pool typically beats both extremes — sometimes by 2-3×.

Suppose you measure 8 hospitals: hospital A has 500 patients with mortality rate 4.2 %; hospital B has 5 patients with mortality 60 %. Which hospital should you believe more, and what is the mixed-effects answer's contribution toward improving the second estimate? Think about α_A vs α_B and what "borrowing strength" means here.

References

Pinheiro, J.C., Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
Gelman, A., Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. (Comprehensive applied treatment.)
Bates, D., Maechler, M., Bolker, B., Walker, S. (2015). "Fitting linear mixed-effects models using lme4." J. Statistical Software 67(1), 1–48.
McCulloch, C.E., Searle, S.R., Neuhaus, J.M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley.
Henderson, C.R. (1975). "Best linear unbiased estimation and prediction under a selection model." Biometrics 31(2), 423–447. (The BLUP shrinkage formula.)