Heteroscedasticity, GLS, and weighted regression

Part 4 — Linear regression, done seriously

Learning objectives

Diagnose heteroscedasticity from residuals and from Breusch-Pagan / White tests
Explain why OLS coefficients stay UNBIASED under heteroscedasticity but their SEs are WRONG
Apply White (1980) sandwich estimator for heteroscedasticity-consistent SEs
Set up Weighted Least Squares (WLS) when the variance pattern is known or estimated
Apply Generalized Least Squares (GLS) when errors have a known full covariance structure

§4.2 named heteroscedasticity as one of the assumption failures; §4.3 showed how to spot it on the diagnostic 4-panel. §4.4 is the response: when residual variance varies with the fitted value or covariates, what do you actually do? The answer comes in three flavours: keep OLS but fix the SEs (sandwich estimator), reweight the regression (WLS), or model the full error covariance (GLS).

What heteroscedasticity does to OLS — and what it doesn't

Under heteroscedasticity, $\mathrm{Var}(\varepsilon_i) = \sigma^2_i$ is not constant. The Gauss–Markov consequence:

OLS $\hat{\boldsymbol{\beta}}$ remains UNBIASED — exogeneity is still satisfied. The point estimate is fine.
OLS is no longer BLUE — there exists a more efficient estimator (WLS).
The "classical" SE formula $\hat{\sigma}^2 (X^T X)^{-1}$ is BIASED — it can be too small OR too large depending on which observations have large $\sigma^2_i$ .

So: t-statistics, p-values, and confidence intervals from "standard" OLS output are wrong under heteroscedasticity, even though the coefficients themselves are fine.

Solution 1: keep OLS, fix the SEs (sandwich estimator)

White (1980) showed that even under arbitrary heteroscedasticity, the correct covariance of $\hat{\boldsymbol{\beta}}$ is

\mathrm{Cov}(\hat{\boldsymbol{\beta}}) = (X^T X)^{-1} X^T \Omega X (X^T X)^{-1}

where $\Omega = \mathrm{diag}(\sigma^2_1, \ldots, \sigma^2_n)$ . The "sandwich" name comes from $(X^T X)^{-1}$ on both sides of $X^T \Omega X$ . White's estimator replaces $\sigma^2_i$ with $e_i^2$ (the squared residual):

\widehat{\mathrm{Cov}}_{\text{HC0}}(\hat{\boldsymbol{\beta}}) = (X^T X)^{-1} \left( \sum_i e_i^2 \, \mathbf{x}_i \mathbf{x}_i^T \right) (X^T X)^{-1}.

Variants HC1, HC2, HC3 add finite-sample corrections (HC3 is usually preferred for small n). All are heteroscedasticity-CONSISTENT: as n → ∞, they recover the correct asymptotic SE.

Practical pitch: if you don't know the form of the heteroscedasticity, sandwich SEs are the safe default. R: sandwich::vcovHC(model, type="HC3"); statsmodels: get_robustcov_results(cov_type="HC3").

Solution 2: weight the regression (WLS)

If you know — or can estimate — the variance structure $\sigma^2_i = \sigma^2 / w_i$ for some weights $w_i$ , Weighted Least Squares minimises

\sum_i w_i \, (Y_i - \mathbf{x}_i^T \boldsymbol{\beta})^2.

Closed form: $\hat{\boldsymbol{\beta}}_{\text{WLS}} = (X^T W X)^{-1} X^T W Y$ with $W = \mathrm{diag}(w_i)$ . WLS gives:

UNBIASED point estimates (same as OLS — both rely only on exogeneity).
BLUE (smaller variance than OLS) if the weights match the true variance structure.
Correct SEs from the standard formula applied to the WLS fit.

The catch: weights must be set BEFORE seeing the residuals (otherwise it's in-sample over-fitting). Common defensible weights:

$w_i = 1/N_i$ when each observation is an average of $N_i$ underlying measurements.
$w_i = 1/x_i$ when variance scales with a known covariate.
$w_i = 1/\hat{\sigma}^2_i$ where $\hat{\sigma}^2_i$ comes from an auxiliary model (Feasible WLS).

Solution 3: full covariance via GLS

Generalized Least Squares handles BOTH heteroscedasticity AND autocorrelation. The model:

Y = X \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \mathrm{Cov}(\boldsymbol{\varepsilon}) = \sigma^2 \Omega

for known PSD matrix $\Omega$ . The GLS estimator:

\hat{\boldsymbol{\beta}}_{\text{GLS}} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} Y.

WLS is the special case where $\Omega$ is diagonal. Full GLS handles AR(1) error structure, random effects (cluster correlation), and panel data.

Detecting heteroscedasticity formally

Breusch–Pagan test (1979): regress $e_i^2$ on the covariates; H_0 is that the slopes are zero (no association of variance with predictors).
White's test (1980): regress $e_i^2$ on covariates, their squares, and their interactions; broader alternative.
Visual: scale-location panel (§4.3) — sharper than formal tests in practice, especially for small n.

Both tests have low power in small n and high power in large n (where they reject for tiny practically-irrelevant deviations). Plot first, test second.

Choosing among the three

You don't want to model the variance structure: USE SANDWICH SEs (HC3). Simple, conservative, asymptotically valid.
You know the variance structure: USE WLS. More efficient than OLS+sandwich.
You have autocorrelation too: USE GLS (or Newey–West HAC SEs for time-series).
Variance structure is unknown but want efficiency: FEASIBLE WLS — estimate $\hat{\sigma}^2_i$ from an auxiliary model, plug in. Has finite-sample drawbacks but often workable.

Try it

In the widget, start with a clean homoscedastic dataset. Confirm OLS classical SEs and HC3 sandwich SEs agree closely; WLS with $w = 1$ matches OLS exactly.
Increase the heteroscedasticity slider. Watch classical SEs DIVERGE from HC3 (often biased downward). The point estimate stays unbiased; only the SE is wrong.
Apply WLS with correctly-specified weights $w_i = 1/x_i$ . Compare WLS SEs to HC3-on-OLS. WLS should be tighter when weights are right.
Mis-specify the weights (use $w_i = x_i$ instead of $1/x_i$ ). WLS becomes worse than OLS — emphasising that wrong weights hurt more than no weights.
Run the Breusch–Pagan diagnostic shown in the widget. At what sample size does it reliably detect moderate heteroscedasticity?

A reviewer flags your OLS output: classical t-statistic = 4.5 (p < 0.001), HC3 sandwich t-statistic = 1.8 (p ≈ 0.07). You did not change β̂; only the standard error changed. Which result should you report as the headline, and what one-sentence justification do you give?

What you now know

Heteroscedasticity leaves $\hat{\boldsymbol{\beta}}$ unbiased but breaks the SE formula. Three principled fixes: (1) sandwich SEs (White 1980 / HC3) — the no-modelling-needed safe default, (2) WLS with correct weights — most efficient when weights are right, dangerous when wrong, (3) full GLS — for combined heteroscedasticity + autocorrelation. §4.5 takes the next step: when the issue is OUTLIERS rather than non-constant variance, switch to robust regression (M-estimators) with a bounded influence function.

References

White, H. (1980). "A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity." Econometrica 48(4), 817–838. (The foundational sandwich-estimator paper.)
Breusch, T.S., Pagan, A.R. (1979). "A simple test for heteroscedasticity and random coefficient variation." Econometrica 47(5), 1287–1294. (The Breusch–Pagan test.)
MacKinnon, J.G., White, H. (1985). "Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties." J. Econometrics 29(3), 305–325. (HC1, HC2, HC3 variants.)
Aitken, A.C. (1935). "On least squares and linear combination of observations." Proc. Royal Society of Edinburgh 55, 42–48. (The foundational GLS paper.)
Greene, W.H. (2018). Econometric Analysis, 8th ed. Pearson. (Chapter 9 has the canonical applied treatment of heteroscedasticity and the WLS/GLS/sandwich trio.)