Logistic regression and odds ratios
Learning objectives
- Fit and interpret logistic regression with logit link
- Translate coefficients into odds ratios via exp(β)
- Distinguish probability, odds, and log-odds
- Apply McFadden's pseudo-R² for logistic model fit
- Recognise common logistic-regression pitfalls
The most-used GLM after OLS. Logistic regression models a binary outcome Y ∈ {0, 1} as P(Y=1 | X) using the logit link.
The model
Equivalently: where is the sigmoid.
Interpreting coefficients
The coefficient is the increase in log-odds of Y=1 per unit increase in , holding others fixed. The odds ratio is :
- : x_j increases the odds of Y=1.
- : x_j decreases the odds of Y=1.
- : x_j is independent of Y.
Example: ⇒ : smokers have 2× the odds of lung cancer compared to non-smokers, holding age, sex, etc. constant.
Probability ≠ odds ≠ log-odds
For p = 0.5: odds = 1, log-odds = 0. For p = 0.99: odds = 99, log-odds ≈ 4.6. The COEFFICIENT scale (β) is log-odds; the BACKED-OUT scale (e^β) is odds; the PROBABILITY (p) needs the sigmoid transform. Don't interpret e^β as a "probability ratio" — it's an ODDS ratio.
Model fit diagnostics
- Deviance: -2(log L_model - log L_saturated). Likelihood-ratio statistic. Compare nested models via deviance differences.
- McFadden's pseudo-R²: 1 - log L_model / log L_null. Values 0.2-0.4 indicate "good fit"; higher than OLS R² requires.
- ROC / AUC: discrimination ability — how well the model ranks Y=1 above Y=0. AUC 0.5 = random; 0.8+ = good discrimination.
- Hosmer-Lemeshow test: calibration test (binned residuals). Subject of debate; visual calibration plot is often better.
Pitfalls
- Complete separation: when one predictor perfectly distinguishes Y=1 from Y=0, MLE diverges (β → ±∞). Use penalised likelihood (Firth correction; ridge).
- Rare events: with very few Y=1 observations, MLE has small-sample bias toward zero. King & Zeng (2001) propose corrections.
- Out-of-sample prediction: probability calibration on held-out data — Platt scaling, isotonic regression (see §3.5).
Fitting logistic regression by IRLS
The widget below simulates a 1-D binary dataset from a TRUE logistic model with chosen β₀ and β₁, then fits a logistic regression by IRLS (Fisher scoring on the binomial log-likelihood). Watch deviance drop in 4-8 iterations, compare fitted vs. true sigmoid, and read off the odds ratio with its 95 % CI.
Try it
- Start with true β₀ = -0.5, β₁ = 1.5, n = 120. Note: the fitted β̂₁ usually lands within 10-20 % of the true 1.5, and the odds ratio CI brackets the true e^1.5 ≈ 4.48. Sample noise drives the discrepancy — IRLS just finds the MLE of the data you handed it, not the truth.
- Drop n to 30 and re-simulate several times (seed slider). The CI for the odds ratio gets WIDER and sometimes excludes the truth — small-sample logistic regression is unreliable, especially near zero counts in either class.
- Set β₁ = 3.0 and watch the deviance curve in the middle panel. With strong signal IRLS still converges in 4-6 iterations — that's the quadratic convergence of Newton-Raphson at work. With β₁ = 0.2, the same curve takes longer because the likelihood is much flatter.
- Push β₁ to its maximum and set the seed to one where the data are nearly separable. The standard error on β̂₁ explodes; the CI for the odds ratio becomes ridiculously wide. This is the LOGISTIC-REGRESSION SEPARATION pathology — fix with Firth correction or ridge penalty.
- Compare the dashed (true) sigmoid with the solid (fitted) sigmoid. They almost always overlap well when n ≥ 100. The AUC reads off how well the model ranks Y=1 above Y=0 — at β₁ = 1.5 with n = 120 you should see AUC ≈ 0.85 - 0.90.
If you fit a logistic regression and read β̂₁ = 0.69 (so e^0.69 ≈ 2), a colleague says "the predicted PROBABILITY of Y=1 doubles per unit increase in x." Why is that wrong, and what is the actual probability change at x = 0 vs. x = 1? (Hint: use the sigmoid, not the odds-ratio shortcut.)
References
- Hosmer, D.W., Lemeshow, S., Sturdivant, R.X. (2013). Applied Logistic Regression, 3rd ed. Wiley.
- Agresti, A. (2018). Statistical Methods for the Social Sciences, 5th ed. Pearson.
- King, G., Zeng, L. (2001). "Logistic regression in rare events data." Political Analysis 9(2), 137–163.
- Firth, D. (1993). "Bias reduction of maximum likelihood estimates." Biometrika 80(1), 27–38.