From linear to generalised: link and family

Part 5 — Generalised linear models

Learning objectives

State the three GLM components: random (response distribution from exponential family), systematic (linear predictor η = Xβ), link (g connecting μ to η)
Identify the canonical link for Normal, Binomial, Poisson, Gamma
Fit GLMs via Iteratively Reweighted Least Squares (IRLS)
Map OLS as a GLM with Normal family + identity link

Linear regression assumes the response Y is conditionally Normal with constant variance. This breaks for binary outcomes (Y in {0,1}), count outcomes (Y in 0,1,2,...), positive-skewed continuous outcomes (lifetimes, costs), and proportions. Generalised Linear Models extend the OLS framework to these cases, keeping the linear-in-parameters predictor while replacing Normality with a more general exponential-family distribution.

The three GLM components

Random component: Y_i has a distribution in the exponential family — Normal, Binomial, Poisson, Gamma, Inverse Gaussian, Negative Binomial, Multinomial, etc. With mean $\mu_i = E[Y_i|X_i]$ and variance $\mathrm{Var}(Y_i|X_i) = \phi V(\mu_i)$ where $V$ is the variance function and $\phi$ the dispersion parameter.
Systematic component: linear predictor $\eta_i = \mathbf{x}_i^T \boldsymbol{\beta}$ .
Link function: invertible $g$ such that $g(\mu_i) = \eta_i$ . The link CONSTRAINS μ to a valid range.

Canonical links

Normal: identity link, $\mu = \eta$ . Recovers OLS.
Binomial (binary, proportions): logit link, $\log!\frac{\mu}{1-\mu} = \eta$ . Logistic regression.
Poisson (counts): log link, $\log \mu = \eta$ . Poisson regression.
Gamma (positive continuous): inverse link, $1/\mu = \eta$ (more common in practice: log link).

The "canonical" link is the one for which the linear predictor equals the natural parameter of the exponential-family distribution. Using the canonical link gives the nicest properties — sufficient statistics in closed form, IRLS converges quickly — but ANY invertible link with the right range can work.

IRLS fitting

GLMs are fitted by Iteratively Reweighted Least Squares: at each iteration, form a "working response" $z_i$ and a working weight $w_i$ , then run weighted least squares. Converges quadratically for canonical links. Implemented in R's glm() and Python's statsmodels GLM.

Visualising the four canonical links

The same linear predictor η = β₀ + β₁·x flows through four different inverse-link functions to produce μ in four very different ranges. Move the sliders — note that:

Normal · identity permits any μ, including negative values (which are absurd for counts, proportions, or lifetimes).
Binomial · logit squashes η ∈ ℝ into (0, 1) — large positive η ⇒ μ → 1, large negative η ⇒ μ → 0.
Poisson · log maps η ∈ ℝ to (0, ∞) — small η changes have multiplicative effects on μ.
Gamma · inverse is only well-defined for η > 0; outside that range μ is undefined.

Try it

Set β₁ = 0 and slide β₀ from -3 to +3. Watch how the Binomial μ moves from near 0 to near 1 along an S-curve, while the Poisson μ scales from ≈ 0.05 to ≈ 20. Same η range, totally different μ behaviour.
Set β₀ = -2, β₁ = 0.5. For which x-values is the Gamma panel UNDEFINED? Convince yourself that the inverse link is unsuitable when η can cross zero — log-link Gamma is the workhorse alternative.
Set β₁ = 1.5. In the Binomial panel, how steep is the sigmoid near η = 0? Now flatten it: β₁ = 0.2. The same one-unit change in x produces a much smaller change in μ when β₁ is small (logistic regression's "marginal effect" depends on where you are on the curve).
Crank n samples up to 200 and look at the Poisson panel: do the dots span the full y-range or hug the lower portion? At η = -2, μ ≈ 0.14 — almost every sample is 0. This is why Poisson regression on rare events needs lots of data.
Try β₀ = 3, β₁ = 0. The Normal panel happily predicts μ = 3 for all x. The Gamma panel correctly gives μ = 1/3 ≈ 0.33. The Binomial panel saturates at μ ≈ 0.95. The Poisson panel gives μ ≈ 20. Four totally different stories from the same η.

If your response is a measured proportion (e.g., germination rate across 200 trials), and you accidentally fit OLS instead of logistic regression, what TWO concrete things go wrong? Hint: think about predicted values that exceed (0, 1), and variance that is wrongly assumed constant when it actually depends on μ.

What you now know

GLMs extend OLS to non-Normal responses via the family + link framework. The link is not a stylistic choice — it CONSTRAINS μ to a valid range and ties the variance structure to the mean. §5.2-5.4 cover the two most important non-Normal cases (logistic and Poisson) and the GLM-specific diagnostics. §5.5 introduces mixed-effects extensions for clustered data. §5.6 closes Part 5 with honest scope: what GLM cannot do.

References

Nelder, J.A., Wedderburn, R.W.M. (1972). "Generalized linear models." J. Roy. Stat. Soc. A 135(3), 370–384. (The foundational GLM paper.)
McCullagh, P., Nelder, J.A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall. (The canonical book.)
Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Wiley.
Dobson, A.J., Barnett, A.G. (2018). An Introduction to Generalized Linear Models, 4th ed. Chapman & Hall.
Wood, S.N. (2017). Generalized Additive Models: An Introduction with R, 2nd ed.