Logistic regression and odds ratios

Part 5 — Generalised linear models

Learning objectives

Fit and interpret logistic regression with logit link
Translate coefficients into odds ratios via exp(β)
Distinguish probability, odds, and log-odds
Apply McFadden's pseudo-R² for logistic model fit
Recognise common logistic-regression pitfalls

The most-used GLM after OLS. Logistic regression models a binary outcome Y ∈ {0, 1} as P(Y=1 | X) using the logit link.

The model

\log\!\frac{p_i}{1 - p_i} = \mathbf{x}_i^T \boldsymbol{\beta}, \quad p_i = P(Y_i = 1 | X_i).

Equivalently: $p_i = \frac{e^{\mathbf{x}_i^T \boldsymbol{\beta}}}{1 + e^{\mathbf{x}_i^T \boldsymbol{\beta}}} = \sigma(\mathbf{x}_i^T \boldsymbol{\beta})$ where $\sigma$ is the sigmoid.

Interpreting coefficients

The coefficient $\beta_j$ is the increase in log-odds of Y=1 per unit increase in $x_j$ , holding others fixed. The odds ratio is $e^{\beta_j}$ :

$e^{\beta} > 1$ : x_j increases the odds of Y=1.
$e^{\beta} < 1$ : x_j decreases the odds of Y=1.
$e^{\beta} = 1$ : x_j is independent of Y.

Example: $\beta_{\text{smoking}} = 0.7$ ⇒ $e^{0.7} = 2.0$ : smokers have 2× the odds of lung cancer compared to non-smokers, holding age, sex, etc. constant.

Probability ≠ odds ≠ log-odds

For p = 0.5: odds = 1, log-odds = 0. For p = 0.99: odds = 99, log-odds ≈ 4.6. The COEFFICIENT scale (β) is log-odds; the BACKED-OUT scale (e^β) is odds; the PROBABILITY (p) needs the sigmoid transform. Don't interpret e^β as a "probability ratio" — it's an ODDS ratio.

Model fit diagnostics

Deviance: -2(log L_model - log L_saturated). Likelihood-ratio statistic. Compare nested models via deviance differences.
McFadden's pseudo-R²: 1 - log L_model / log L_null. Values 0.2-0.4 indicate "good fit"; higher than OLS R² requires.
ROC / AUC: discrimination ability — how well the model ranks Y=1 above Y=0. AUC 0.5 = random; 0.8+ = good discrimination.
Hosmer-Lemeshow test: calibration test (binned residuals). Subject of debate; visual calibration plot is often better.

Pitfalls

Complete separation: when one predictor perfectly distinguishes Y=1 from Y=0, MLE diverges (β → ±∞). Use penalised likelihood (Firth correction; ridge).
Rare events: with very few Y=1 observations, MLE has small-sample bias toward zero. King & Zeng (2001) propose corrections.
Out-of-sample prediction: probability calibration on held-out data — Platt scaling, isotonic regression (see §3.5).

Fitting logistic regression by IRLS

The widget below simulates a 1-D binary dataset from a TRUE logistic model with chosen β₀ and β₁, then fits a logistic regression by IRLS (Fisher scoring on the binomial log-likelihood). Watch deviance drop in 4-8 iterations, compare fitted vs. true sigmoid, and read off the odds ratio with its 95 % CI.

Try it

Start with true β₀ = -0.5, β₁ = 1.5, n = 120. Note: the fitted β̂₁ usually lands within 10-20 % of the true 1.5, and the odds ratio CI brackets the true e^1.5 ≈ 4.48. Sample noise drives the discrepancy — IRLS just finds the MLE of the data you handed it, not the truth.
Drop n to 30 and re-simulate several times (seed slider). The CI for the odds ratio gets WIDER and sometimes excludes the truth — small-sample logistic regression is unreliable, especially near zero counts in either class.
Set β₁ = 3.0 and watch the deviance curve in the middle panel. With strong signal IRLS still converges in 4-6 iterations — that's the quadratic convergence of Newton-Raphson at work. With β₁ = 0.2, the same curve takes longer because the likelihood is much flatter.
Push β₁ to its maximum and set the seed to one where the data are nearly separable. The standard error on β̂₁ explodes; the CI for the odds ratio becomes ridiculously wide. This is the LOGISTIC-REGRESSION SEPARATION pathology — fix with Firth correction or ridge penalty.
Compare the dashed (true) sigmoid with the solid (fitted) sigmoid. They almost always overlap well when n ≥ 100. The AUC reads off how well the model ranks Y=1 above Y=0 — at β₁ = 1.5 with n = 120 you should see AUC ≈ 0.85 - 0.90.

If you fit a logistic regression and read β̂₁ = 0.69 (so e^0.69 ≈ 2), a colleague says "the predicted PROBABILITY of Y=1 doubles per unit increase in x." Why is that wrong, and what is the actual probability change at x = 0 vs. x = 1? (Hint: use the sigmoid, not the odds-ratio shortcut.)

References

Hosmer, D.W., Lemeshow, S., Sturdivant, R.X. (2013). Applied Logistic Regression, 3rd ed. Wiley.
Agresti, A. (2018). Statistical Methods for the Social Sciences, 5th ed. Pearson.
King, G., Zeng, L. (2001). "Logistic regression in rare events data." Political Analysis 9(2), 137–163.
Firth, D. (1993). "Bias reduction of maximum likelihood estimates." Biometrika 80(1), 27–38.