Mixed-effects and hierarchical models intro

Part 5 — Generalised linear models

Learning objectives

  • Recognise clustered data (repeated measures, multi-level groups) as requiring hierarchical models
  • Add random intercepts to allow group-specific deviations from the population mean
  • Distinguish FIXED effects (population-level coefficients of interest) from RANDOM effects (group-specific adjustments)
  • Use lme4 / nlme / statsmodels.MixedLM for fitting
  • Recognise when ignoring clustering inflates Type-I errors

So far Part 4 + §5.1-5.4 treated all observations as independent. Real data often violates this: students within schools, patients within hospitals, repeated measurements per subject, plants within plots. The grouped structure creates within-group correlation that ordinary GLMs ignore — inflating Type-I errors and giving incorrect SEs.

The random intercept model

For data grouped into J clusters with njn_j observations per cluster:

Yij=β0+uj+β1xij+εij,ujN(0,σu2),εijN(0,σe2).Y_{ij} = \beta_0 + u_j + \beta_1 x_{ij} + \varepsilon_{ij}, \quad u_j \sim N(0, \sigma^2_u), \quad \varepsilon_{ij} \sim N(0, \sigma^2_e).

The uju_j is a cluster-specific random effect — each cluster gets its own intercept deviation from the population mean β_0. Variance components σu2\sigma^2_u (between-cluster) and σe2\sigma^2_e (within-cluster) are estimated.

Why this matters

Without the random effect, observations within a cluster are treated as independent. They're NOT: students within the same school share teachers, neighborhood, etc. The "effective sample size" is much less than the raw n. Ignoring clustering ⇒ SEs underestimated ⇒ Type-I error inflated (sometimes dramatically).

Random slopes

Beyond random intercepts: random slopes let the effect of a covariate vary by cluster. Example: in a study of teaching methods across schools, the EFFECT of the method might differ across schools. Model:

Yij=β0+uj+(β1+vj)xij+εij,Y_{ij} = \beta_0 + u_j + (\beta_1 + v_j) x_{ij} + \varepsilon_{ij},

where (uj,vj)N(0,Σ)(u_j, v_j) \sim N(0, \Sigma) joint Normal with covariance matrix Σ.

Fitting

Maximum likelihood (or REML). R: lme4::lmer for linear mixed; lme4::glmer for GLM mixed. Python: statsmodels.MixedLM; for non-Normal, glmer.PyMC via Bayesian methods.

When to switch from fixed-effects-only to mixed-effects

  • Clear hierarchical structure (students-in-schools, patients-in-hospitals).
  • Repeated measurements per subject.
  • Suspected within-cluster correlation that biased SEs in non-mixed analysis.

Shrinkage — the key idea, visualised

Partial pooling (mixed-effects fit) sits between two extremes: COMPLETE POOLING (use one number for all clusters — ignores the structure) and NO POOLING (treat each cluster as independent — gives noisy estimates for small clusters). The mixed-effects BLUP estimate for cluster j is:

μ^j=yˉ+αj(yˉjyˉ),αj=njσu2njσu2+σe2.\hat{\mu}_j = \bar{y} + \alpha_j (\bar{y}_j - \bar{y}), \qquad \alpha_j = \frac{n_j \sigma^2_u}{n_j \sigma^2_u + \sigma^2_e}.

Small clusters (small njn_j) get α_j near 0 — strong shrinkage TOWARD the grand mean. Large clusters get α_j near 1 — almost no shrinkage. The widget shows this graphically; toggle between complete / no-pool / partial-pool views and watch the cluster-specific estimates contract.

Mixed Effects Shrinkage DemoInteractive figure — enable JavaScript to interact.

Try it

  • Default settings (J = 16, n̄ = 8, σ_u = 1.5, σ_e = 2.0). Toggle between the three views. Notice in PARTIAL pooling: the small-n_j clusters move strongly toward the grand mean line (dashed); large-n_j ones barely budge. The dashed lines in the partial view trace exactly this contraction.
  • Crank σ_u down to 0.2 (low between-cluster variance — i.e., the cluster means are nearly identical). Watch ALL partial-pool estimates collapse onto the grand mean. Complete pooling and partial pooling become indistinguishable.
  • Now crank σ_u up to 5 (very different cluster means). Partial pool now barely shrinks at all — it's nearly identical to no-pool. The mixed-effects fit "knows" the clusters are genuinely different and trusts each within-cluster mean.
  • Set σ_u = 1.5, σ_e = 5 (high within-cluster noise). Partial pool now shrinks aggressively — the within-cluster means are too noisy to trust without borrowing strength.
  • Set n̄ = 3 (very small clusters). All α_j drop low, even with moderate σ_u. Compare MSEs: partial pool typically beats both extremes — sometimes by 2-3×.

Suppose you measure 8 hospitals: hospital A has 500 patients with mortality rate 4.2 %; hospital B has 5 patients with mortality 60 %. Which hospital should you believe more, and what is the mixed-effects answer's contribution toward improving the second estimate? Think about α_A vs α_B and what "borrowing strength" means here.

References

  • Pinheiro, J.C., Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
  • Gelman, A., Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. (Comprehensive applied treatment.)
  • Bates, D., Maechler, M., Bolker, B., Walker, S. (2015). "Fitting linear mixed-effects models using lme4." J. Statistical Software 67(1), 1–48.
  • McCulloch, C.E., Searle, S.R., Neuhaus, J.M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley.
  • Henderson, C.R. (1975). "Best linear unbiased estimation and prediction under a selection model." Biometrics 31(2), 423–447. (The BLUP shrinkage formula.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.