Robust regression

Part 4 — Linear regression, done seriously

Learning objectives

Diagnose when outliers are contaminating an OLS fit (vs when heteroscedasticity is the issue)
Apply Huber M-regression: bounded influence on residuals, ~95% Gaussian efficiency at k=1.345·σ
Apply MM-estimators: 50% breakdown point + high Gaussian efficiency
Choose between robust regression vs OLS + sandwich SEs vs outlier removal
Recognise that robust regression is NOT 'OLS but ignore outliers' — it has a coherent statistical theory

§1.8 introduced robust and M-estimators for the location problem. §4.5 extends to regression: when OLS coefficients are pulled by outliers (high residual + high leverage from §4.3), robust regression uses a bounded influence function to limit the damage. The natural follow-up to §4.4's response to NON-CONSTANT variance: §4.5 is the response to OUTLIERS.

The problem with OLS under contamination

OLS minimises $\sum e_i^2$ . The quadratic loss makes residuals at the data's tails count quadratically more than residuals near the centre. A single outlier with residual 4σ contributes 16× more to the sum than a typical residual at 1σ. The fit is pulled toward the outlier to reduce its squared cost — sometimes dramatically.

Specifically: OLS has BREAKDOWN POINT 0. A single outlier at infinity drags the fit arbitrarily far. This is why outliers + high leverage = catastrophic damage.

The robust regression idea

Replace the quadratic loss with one that GROWS LESS THAN QUADRATICALLY for large residuals. The estimator solves

\hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} \sum_{i=1}^n \rho(Y_i - \mathbf{x}_i^T \boldsymbol{\beta}),

where $\rho$ is the loss function. The influence function $\psi = \rho'$ controls how much each residual pulls the fit:

OLS: $\rho(u) = u^2 / 2$ , $\psi(u) = u$ . Unbounded — every residual matters arbitrarily.
L1 / median regression: $\rho(u) = |u|$ , $\psi(u) = \mathrm{sign}(u)$ . Bounded by ±1. Resistant but inefficient at Gaussian (~64%).
Huber M-regression: $\rho(u) = u^2/2$ for $|u| < k$ , $\rho(u) = k|u| - k^2/2$ for $|u| \geq k$ . Quadratic near 0 (efficient), linear in tails (bounded influence).
Tukey biweight: $\rho$ saturates entirely beyond $|u| > c$ . Influence drops to ZERO for extreme outliers — they get completely ignored.

Huber M-regression

The workhorse robust regression. Tuning constant $k = 1.345 \hat{\sigma}$ (using a robust scale estimate like MAD/0.6745) gives ~95% efficiency at exactly-Gaussian errors and good resistance up to ~10-15% contamination. Solved via Iteratively Reweighted Least Squares (IRLS):

Initial fit (e.g., LS or LAD).
Compute residuals $e_i$ and weights $w_i = \psi(e_i / \hat{\sigma}) / (e_i / \hat{\sigma})$ .
Re-fit by WLS with these weights.
Iterate until convergence (typically 3-10 iterations).

R: MASS::rlm. statsmodels: RLM in statsmodels.api.

MM-estimators: high breakdown + high efficiency

Huber has good Gaussian efficiency but only ~10-15% breakdown. Yohai (1987) proposed MM-estimators:

First stage: S-estimator gives a 50%-breakdown initial fit (highly resistant but inefficient).
Second stage: refit with a smooth bounded influence function (Tukey biweight, tuning for 95% Gaussian efficiency).

Result: 50% breakdown AND ~95% Gaussian efficiency. The modern default for robust regression. R: robustbase::lmrob; Python: statsmodels.RLM or scikit-learn.linear_model.HuberRegressor.

Choosing the right tool

OLS + sandwich SEs (§4.4): heteroscedastic data, no outlier concerns.
Huber M-regression: light-to-moderate contamination (5-15%), Normal-tailed otherwise.
MM-regression: severe contamination (up to 50%), or unknown contamination level.
OLS without sandwich, after outlier removal: BIAS-INDUCING and brittle. Avoid.

Honest caveats

Robust regression assumes the bulk of the data follows a single model with some outliers. If the data is a MIXTURE of two regimes, robust regression fits the larger regime and ignores the other — which may not be what you want.
Tuning constants ARE tunable. Defaults (k=1.345 Huber, c=4.685 Tukey) achieve 95% Gaussian efficiency. Lower values are more resistant but less efficient.
Inference (SEs, CIs) for robust regression is less standardised than OLS. R's rlm uses asymptotic SEs based on $\psi$ ' (X'X)^-1; lmrob uses sandwich-style SEs. Bootstrap CIs are often the safer choice.

Try it

Start with clean Gaussian data. Confirm OLS and Huber give nearly identical estimates (within ~0.01) and nearly identical SEs (within ~5%).
Add a single high-leverage outlier (drag a point to a far X position and pull it vertically off the line). Watch the OLS line rotate sharply; Huber barely moves.
Add 5 outliers to a sample of n = 40. Now OLS is severely biased; Huber is still close to truth; MM is essentially unbiased.
Push contamination to 30% (12 of 40 points are outliers). Huber starts to break; MM is still fine.
Push to 55% contamination. Even MM breaks — by definition, the breakdown point is 50% and the "majority" the estimator follows has flipped.

A colleague says "I removed 3 outliers from my dataset by eye, then ran OLS. The conclusion held; I'll report that." What are the TWO methodological objections — and what would you advise them to do instead, using the §4.5 toolkit?

What you now know

Robust regression replaces quadratic loss with a bounded-influence loss, sacrificing some Gaussian efficiency in exchange for resistance to outliers. Huber M-regression is the moderate-contamination default; MM-estimators give 50% breakdown with 95% Gaussian efficiency. Robust regression is the principled alternative to manual outlier removal — coherent theory, reproducible, no p-hacking risk. §4.6 turns to a related question: what if the underlying RELATIONSHIP is nonlinear or has interactions OLS doesn't capture?

References

Huber, P.J. (1964). "Robust estimation of a location parameter." Annals of Math. Stat. 35(1), 73–101. (The foundational M-estimator paper.)
Huber, P.J. (1981). Robust Statistics. Wiley. (First edition; the canonical book-length treatment.)
Yohai, V.J. (1987). "High breakdown-point and high efficiency robust estimates for regression." Annals of Statistics 15(2), 642–656. (The MM-estimator paper.)
Rousseeuw, P.J., Leroy, A.M. (1987). Robust Regression and Outlier Detection. Wiley. (The applied-regression treatment, including LMS, LTS, S-estimators.)
Maronna, R.A., Martin, R.D., Yohai, V.J. (2006). Robust Statistics: Theory and Methods. Wiley. (The modern comprehensive reference.)