When the GLM is not enough

Part 5 — Generalised linear models

Learning objectives

Recognise when GLM assumptions break: non-exponential-family responses, complex link functions, deep nonlinearity
Map alternative tools to specific GLM failures: GAMs (nonlinear), Bayesian models (sparse data + priors), survival analysis (time-to-event), generalised additive mixed models (clustering + nonlinearity)
Set realistic expectations for the limits of regression-based inference

Parts 4 and 5 have built a rich regression toolkit. But it's not complete. §5.6 closes Part 5 — and Round 2 — by mapping out what GLMs cannot do, and what tools take over.

Limit 1: nonlinear response surfaces

GLM's linear predictor $\eta = X\boldsymbol{\beta}$ + a link function captures certain nonlinearities, but not all. For genuinely smooth nonlinear surfaces (e.g., the effect of age is U-shaped, varying smoothly), use Generalised Additive Models (GAMs):

g(\mu) = \beta_0 + s_1(x_1) + s_2(x_2) + \ldots

where each $s_j$ is a smooth function (cubic spline, thin-plate spline). GAMs preserve GLM-style inference for the linear part + flexibility for nonlinear terms. R: mgcv::gam.

Limit 2: complex link / non-exponential families

Some responses don't fit the exponential family (e.g., quasi-binomial, Tweedie, ordinal). Solutions:

Quasi-likelihood: relax the exponential family assumption; use only the mean-variance relationship. Fit via IRLS.
Ordinal regression: cumulative-link models for ordinal outcomes (proportional odds).
Tweedie distributions: model insurance claim amounts (mixture of point-mass-at-zero + Gamma).

Limit 3: small samples + strong priors

GLM MLE in small-n + many-covariate settings is unstable. Switch to Bayesian regularised GLM (Part 7's machinery): impose priors on β, integrate over them, get principled regularised estimates.

Limit 4: time-to-event data

Survival data (time until an event, possibly censored) needs survival analysis, not GLM. Cox proportional hazards model: semiparametric estimation of hazard ratios. Parametric survival models: Weibull, lognormal, etc.

Limit 5: causal inference

GLMs estimate associations. For causal effects, see Part 6: RCTs, IVs, RDDs, DiDs, propensity scores.

The bigger picture

GLMs are the workhorse of applied statistics — but they sit within a larger ecosystem. The Box-and-Cox transformation framework, the GLM extensions, GAMs, mixed models, Bayesian methods, survival analysis, causal inference — they all share intellectual ancestry with OLS. Understanding GLM gives you the foundation to pick up any of these when needed.

What you now know (closing Part 5)

GLMs extend OLS to non-Normal responses via the family + link framework. Logistic regression for binary; Poisson + negative-binomial for counts; mixed-effects for clustered data. Deviance residuals + dispersion checks form the GLM-specific diagnostic toolkit. When GLMs fall short — nonlinear surfaces, complex outcomes, survival, causal inference — the next layer of statistical machinery awaits in Parts 6-9.

SDS Round 2 is now complete: Parts 1-5 all live.

References

Wood, S.N. (2017). Generalized Additive Models: An Introduction with R, 2nd ed.
Wedderburn, R.W.M. (1974). "Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method." Biometrika 61(3), 439–447.
Cox, D.R. (1972). "Regression models and life-tables." J. Roy. Stat. Soc. B 34(2), 187–220. (The Cox proportional hazards paper.)
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B. (2013). Bayesian Data Analysis, 3rd ed. Chapman & Hall.
Klein, J.P., Moeschberger, M.L. (2003). Survival Analysis, 2nd ed. Springer.
Wood, S.N. (2017). Generalized Additive Models: An Introduction with R, 2nd ed. Chapman & Hall. (GAM reference.)
Huber, P.J. (1964). "Robust estimation of a location parameter." Annals of Math. Stat. 35(1), 73–101.

Four failure modes, side-by-side

The widget below lets you load four canonical scenarios where plain GLM is the wrong tool. Each shows: data + linear-GLM fit (solid green) + true mean structure (dashed grey), plus a residual-vs-fitted diagnostic that screams "the model is wrong."

Try it

Scenario A (nonlinear): Look at the residual plot — clear sinusoidal pattern. The linear fit has no chance against a sinusoidal truth. A GAM with smooth s(x) would absorb the nonlinearity. KEY: residual structure tells you what is missing.
Scenario B (heavy-tailed): Re-seed several times. Red points are |residual| > 2. OLS β̂₁ jumps around — sometimes 0.4, sometimes 0.7. Outliers wield disproportionate influence. Quantile regression at the median or a t-error MLE would be far more stable.
Scenario C (censored): Red dots are censored at y = 4. The fit slope is artificially flat at the top. Survival analysis (Cox PH or Tobit) preserves the slope by treating those as "known to exceed 4." This is a HUGE common error in dose-response and reliability analyses.
Scenario D (heterogeneous slopes): The "population" slope estimated by OLS is near zero, but NO INDIVIDUAL CLUSTER has slope near zero. The mean of slopes ≠ slope of means. A random-slope mixed model would correctly estimate σ²_slope ≈ 0.6² and report each cluster's individual slope.
For each scenario, ask: what does the residual plot SAY? Wave pattern → nonlinearity. Fan / outliers → heavy tails or heteroscedasticity. Flat ceiling → censoring. Multimodal cluster bands → clustering. Diagnostics ARE the model-selection process.

If you fit a logistic regression to predict bond default and a colleague says "the model has AUC 0.85, so it's good", what TWO additional checks would you insist on before deploying it to make billion-dollar capital decisions? (Hint: think about distribution shift, calibration, and what the model can't see.)