Heteroscedasticity, GLS, and weighted regression

Part 4 — Linear regression, done seriously

Learning objectives

  • Diagnose heteroscedasticity from residuals and from Breusch-Pagan / White tests
  • Explain why OLS coefficients stay UNBIASED under heteroscedasticity but their SEs are WRONG
  • Apply White (1980) sandwich estimator for heteroscedasticity-consistent SEs
  • Set up Weighted Least Squares (WLS) when the variance pattern is known or estimated
  • Apply Generalized Least Squares (GLS) when errors have a known full covariance structure

§4.2 named heteroscedasticity as one of the assumption failures; §4.3 showed how to spot it on the diagnostic 4-panel. §4.4 is the response: when residual variance varies with the fitted value or covariates, what do you actually do? The answer comes in three flavours: keep OLS but fix the SEs (sandwich estimator), reweight the regression (WLS), or model the full error covariance (GLS).

What heteroscedasticity does to OLS — and what it doesn't

Under heteroscedasticity, Var(εi)=σi2\mathrm{Var}(\varepsilon_i) = \sigma^2_i is not constant. The Gauss–Markov consequence:

  • OLS β^\hat{\boldsymbol{\beta}} remains UNBIASED — exogeneity is still satisfied. The point estimate is fine.
  • OLS is no longer BLUE — there exists a more efficient estimator (WLS).
  • The "classical" SE formula σ^2(XTX)1\hat{\sigma}^2 (X^T X)^{-1} is BIASED — it can be too small OR too large depending on which observations have large σi2\sigma^2_i.

So: t-statistics, p-values, and confidence intervals from "standard" OLS output are wrong under heteroscedasticity, even though the coefficients themselves are fine.

Solution 1: keep OLS, fix the SEs (sandwich estimator)

White (1980) showed that even under arbitrary heteroscedasticity, the correct covariance of β^\hat{\boldsymbol{\beta}} is

Cov(β^)=(XTX)1XTΩX(XTX)1\mathrm{Cov}(\hat{\boldsymbol{\beta}}) = (X^T X)^{-1} X^T \Omega X (X^T X)^{-1}

where Ω=diag(σ12,,σn2)\Omega = \mathrm{diag}(\sigma^2_1, \ldots, \sigma^2_n). The "sandwich" name comes from (XTX)1(X^T X)^{-1} on both sides of XTΩXX^T \Omega X. White's estimator replaces σi2\sigma^2_i with ei2e_i^2 (the squared residual):

Cov^HC0(β^)=(XTX)1(iei2xixiT)(XTX)1.\widehat{\mathrm{Cov}}_{\text{HC0}}(\hat{\boldsymbol{\beta}}) = (X^T X)^{-1} \left( \sum_i e_i^2 \, \mathbf{x}_i \mathbf{x}_i^T \right) (X^T X)^{-1}.

Variants HC1, HC2, HC3 add finite-sample corrections (HC3 is usually preferred for small n). All are heteroscedasticity-CONSISTENT: as n → ∞, they recover the correct asymptotic SE.

Practical pitch: if you don't know the form of the heteroscedasticity, sandwich SEs are the safe default. R: sandwich::vcovHC(model, type="HC3"); statsmodels: get_robustcov_results(cov_type="HC3").

Solution 2: weight the regression (WLS)

If you know — or can estimate — the variance structure σi2=σ2/wi\sigma^2_i = \sigma^2 / w_i for some weights wiw_i, Weighted Least Squares minimises

iwi(YixiTβ)2.\sum_i w_i \, (Y_i - \mathbf{x}_i^T \boldsymbol{\beta})^2.

Closed form: β^WLS=(XTWX)1XTWY\hat{\boldsymbol{\beta}}_{\text{WLS}} = (X^T W X)^{-1} X^T W Y with W=diag(wi)W = \mathrm{diag}(w_i). WLS gives:

  • UNBIASED point estimates (same as OLS — both rely only on exogeneity).
  • BLUE (smaller variance than OLS) if the weights match the true variance structure.
  • Correct SEs from the standard formula applied to the WLS fit.

The catch: weights must be set BEFORE seeing the residuals (otherwise it's in-sample over-fitting). Common defensible weights:

  • wi=1/Niw_i = 1/N_i when each observation is an average of NiN_i underlying measurements.
  • wi=1/xiw_i = 1/x_i when variance scales with a known covariate.
  • wi=1/σ^i2w_i = 1/\hat{\sigma}^2_i where σ^i2\hat{\sigma}^2_i comes from an auxiliary model (Feasible WLS).

Solution 3: full covariance via GLS

Generalized Least Squares handles BOTH heteroscedasticity AND autocorrelation. The model:

Y=Xβ+ε,Cov(ε)=σ2ΩY = X \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathrm{Cov}(\boldsymbol{\varepsilon}) = \sigma^2 \Omega

for known PSD matrix Ω\Omega. The GLS estimator:

β^GLS=(XTΩ1X)1XTΩ1Y.\hat{\boldsymbol{\beta}}_{\text{GLS}} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} Y.

WLS is the special case where Ω\Omega is diagonal. Full GLS handles AR(1) error structure, random effects (cluster correlation), and panel data.

Detecting heteroscedasticity formally

  • Breusch–Pagan test (1979): regress ei2e_i^2 on the covariates; H_0 is that the slopes are zero (no association of variance with predictors).
  • White's test (1980): regress ei2e_i^2 on covariates, their squares, and their interactions; broader alternative.
  • Visual: scale-location panel (§4.3) — sharper than formal tests in practice, especially for small n.

Both tests have low power in small n and high power in large n (where they reject for tiny practically-irrelevant deviations). Plot first, test second.

Choosing among the three

  • You don't want to model the variance structure: USE SANDWICH SEs (HC3). Simple, conservative, asymptotically valid.
  • You know the variance structure: USE WLS. More efficient than OLS+sandwich.
  • You have autocorrelation too: USE GLS (or Newey–West HAC SEs for time-series).
  • Variance structure is unknown but want efficiency: FEASIBLE WLS — estimate σ^i2\hat{\sigma}^2_i from an auxiliary model, plug in. Has finite-sample drawbacks but often workable.

Hetero Vs FixInteractive figure — enable JavaScript to interact.

Try it

  • In the widget, start with a clean homoscedastic dataset. Confirm OLS classical SEs and HC3 sandwich SEs agree closely; WLS with w=1w = 1 matches OLS exactly.
  • Increase the heteroscedasticity slider. Watch classical SEs DIVERGE from HC3 (often biased downward). The point estimate stays unbiased; only the SE is wrong.
  • Apply WLS with correctly-specified weights wi=1/xiw_i = 1/x_i. Compare WLS SEs to HC3-on-OLS. WLS should be tighter when weights are right.
  • Mis-specify the weights (use wi=xiw_i = x_i instead of 1/xi1/x_i). WLS becomes worse than OLS — emphasising that wrong weights hurt more than no weights.
  • Run the Breusch–Pagan diagnostic shown in the widget. At what sample size does it reliably detect moderate heteroscedasticity?

A reviewer flags your OLS output: classical t-statistic = 4.5 (p < 0.001), HC3 sandwich t-statistic = 1.8 (p ≈ 0.07). You did not change β̂; only the standard error changed. Which result should you report as the headline, and what one-sentence justification do you give?

What you now know

Heteroscedasticity leaves β^\hat{\boldsymbol{\beta}} unbiased but breaks the SE formula. Three principled fixes: (1) sandwich SEs (White 1980 / HC3) — the no-modelling-needed safe default, (2) WLS with correct weights — most efficient when weights are right, dangerous when wrong, (3) full GLS — for combined heteroscedasticity + autocorrelation. §4.5 takes the next step: when the issue is OUTLIERS rather than non-constant variance, switch to robust regression (M-estimators) with a bounded influence function.

References

  • White, H. (1980). "A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity." Econometrica 48(4), 817–838. (The foundational sandwich-estimator paper.)
  • Breusch, T.S., Pagan, A.R. (1979). "A simple test for heteroscedasticity and random coefficient variation." Econometrica 47(5), 1287–1294. (The Breusch–Pagan test.)
  • MacKinnon, J.G., White, H. (1985). "Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties." J. Econometrics 29(3), 305–325. (HC1, HC2, HC3 variants.)
  • Aitken, A.C. (1935). "On least squares and linear combination of observations." Proc. Royal Society of Edinburgh 55, 42–48. (The foundational GLS paper.)
  • Greene, W.H. (2018). Econometric Analysis, 8th ed. Pearson. (Chapter 9 has the canonical applied treatment of heteroscedasticity and the WLS/GLS/sandwich trio.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.