Robust regression

Part 4 — Linear regression, done seriously

Learning objectives

  • Diagnose when outliers are contaminating an OLS fit (vs when heteroscedasticity is the issue)
  • Apply Huber M-regression: bounded influence on residuals, ~95% Gaussian efficiency at k=1.345·σ
  • Apply MM-estimators: 50% breakdown point + high Gaussian efficiency
  • Choose between robust regression vs OLS + sandwich SEs vs outlier removal
  • Recognise that robust regression is NOT 'OLS but ignore outliers' — it has a coherent statistical theory

§1.8 introduced robust and M-estimators for the location problem. §4.5 extends to regression: when OLS coefficients are pulled by outliers (high residual + high leverage from §4.3), robust regression uses a bounded influence function to limit the damage. The natural follow-up to §4.4's response to NON-CONSTANT variance: §4.5 is the response to OUTLIERS.

The problem with OLS under contamination

OLS minimises ei2\sum e_i^2. The quadratic loss makes residuals at the data's tails count quadratically more than residuals near the centre. A single outlier with residual 4σ contributes 16× more to the sum than a typical residual at 1σ. The fit is pulled toward the outlier to reduce its squared cost — sometimes dramatically.

Specifically: OLS has BREAKDOWN POINT 0. A single outlier at infinity drags the fit arbitrarily far. This is why outliers + high leverage = catastrophic damage.

The robust regression idea

Replace the quadratic loss with one that GROWS LESS THAN QUADRATICALLY for large residuals. The estimator solves

β^=argminβi=1nρ(YixiTβ),\hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} \sum_{i=1}^n \rho(Y_i - \mathbf{x}_i^T \boldsymbol{\beta}),

where ρ\rho is the loss function. The influence function ψ=ρ\psi = \rho' controls how much each residual pulls the fit:

  • OLS: ρ(u)=u2/2\rho(u) = u^2 / 2, ψ(u)=u\psi(u) = u. Unbounded — every residual matters arbitrarily.
  • L1 / median regression: ρ(u)=u\rho(u) = |u|, ψ(u)=sign(u)\psi(u) = \mathrm{sign}(u). Bounded by ±1. Resistant but inefficient at Gaussian (~64%).
  • Huber M-regression: ρ(u)=u2/2\rho(u) = u^2/2 for u<k|u| < k, ρ(u)=kuk2/2\rho(u) = k|u| - k^2/2 for uk|u| \geq k. Quadratic near 0 (efficient), linear in tails (bounded influence).
  • Tukey biweight: ρ\rho saturates entirely beyond u>c|u| > c. Influence drops to ZERO for extreme outliers — they get completely ignored.

Huber M-regression

The workhorse robust regression. Tuning constant k=1.345σ^k = 1.345 \hat{\sigma} (using a robust scale estimate like MAD/0.6745) gives ~95% efficiency at exactly-Gaussian errors and good resistance up to ~10-15% contamination. Solved via Iteratively Reweighted Least Squares (IRLS):

  • Initial fit (e.g., LS or LAD).
  • Compute residuals eie_i and weights wi=ψ(ei/σ^)/(ei/σ^)w_i = \psi(e_i / \hat{\sigma}) / (e_i / \hat{\sigma}).
  • Re-fit by WLS with these weights.
  • Iterate until convergence (typically 3-10 iterations).

R: MASS::rlm. statsmodels: RLM in statsmodels.api.

MM-estimators: high breakdown + high efficiency

Huber has good Gaussian efficiency but only ~10-15% breakdown. Yohai (1987) proposed MM-estimators:

  • First stage: S-estimator gives a 50%-breakdown initial fit (highly resistant but inefficient).
  • Second stage: refit with a smooth bounded influence function (Tukey biweight, tuning for 95% Gaussian efficiency).

Result: 50% breakdown AND ~95% Gaussian efficiency. The modern default for robust regression. R: robustbase::lmrob; Python: statsmodels.RLM or scikit-learn.linear_model.HuberRegressor.

Choosing the right tool

  • OLS + sandwich SEs (§4.4): heteroscedastic data, no outlier concerns.
  • Huber M-regression: light-to-moderate contamination (5-15%), Normal-tailed otherwise.
  • MM-regression: severe contamination (up to 50%), or unknown contamination level.
  • OLS without sandwich, after outlier removal: BIAS-INDUCING and brittle. Avoid.

Honest caveats

  • Robust regression assumes the bulk of the data follows a single model with some outliers. If the data is a MIXTURE of two regimes, robust regression fits the larger regime and ignores the other — which may not be what you want.
  • Tuning constants ARE tunable. Defaults (k=1.345 Huber, c=4.685 Tukey) achieve 95% Gaussian efficiency. Lower values are more resistant but less efficient.
  • Inference (SEs, CIs) for robust regression is less standardised than OLS. R's rlm uses asymptotic SEs based on ψ\psi' (X'X)-1; lmrob uses sandwich-style SEs. Bootstrap CIs are often the safer choice.

Robust RegressionInteractive figure — enable JavaScript to interact.

Try it

  • Start with clean Gaussian data. Confirm OLS and Huber give nearly identical estimates (within ~0.01) and nearly identical SEs (within ~5%).
  • Add a single high-leverage outlier (drag a point to a far X position and pull it vertically off the line). Watch the OLS line rotate sharply; Huber barely moves.
  • Add 5 outliers to a sample of n = 40. Now OLS is severely biased; Huber is still close to truth; MM is essentially unbiased.
  • Push contamination to 30% (12 of 40 points are outliers). Huber starts to break; MM is still fine.
  • Push to 55% contamination. Even MM breaks — by definition, the breakdown point is 50% and the "majority" the estimator follows has flipped.

A colleague says "I removed 3 outliers from my dataset by eye, then ran OLS. The conclusion held; I'll report that." What are the TWO methodological objections — and what would you advise them to do instead, using the §4.5 toolkit?

What you now know

Robust regression replaces quadratic loss with a bounded-influence loss, sacrificing some Gaussian efficiency in exchange for resistance to outliers. Huber M-regression is the moderate-contamination default; MM-estimators give 50% breakdown with 95% Gaussian efficiency. Robust regression is the principled alternative to manual outlier removal — coherent theory, reproducible, no p-hacking risk. §4.6 turns to a related question: what if the underlying RELATIONSHIP is nonlinear or has interactions OLS doesn't capture?

References

  • Huber, P.J. (1964). "Robust estimation of a location parameter." Annals of Math. Stat. 35(1), 73–101. (The foundational M-estimator paper.)
  • Huber, P.J. (1981). Robust Statistics. Wiley. (First edition; the canonical book-length treatment.)
  • Yohai, V.J. (1987). "High breakdown-point and high efficiency robust estimates for regression." Annals of Statistics 15(2), 642–656. (The MM-estimator paper.)
  • Rousseeuw, P.J., Leroy, A.M. (1987). Robust Regression and Outlier Detection. Wiley. (The applied-regression treatment, including LMS, LTS, S-estimators.)
  • Maronna, R.A., Martin, R.D., Yohai, V.J. (2006). Robust Statistics: Theory and Methods. Wiley. (The modern comprehensive reference.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.