Leave-one-out cross-validation for kriging

Part 6 — Cross-validation and QC

Learning objectives

Define LEAVE-ONE-OUT CROSS-VALIDATION for geostatistical models
Compute the standard diagnostics: ME, RMSE, MAE, conditional bias
Check whether the assumed VARIOGRAM gives WELL-CALIBRATED kriging variance via the variance ratio
Diagnose under-fitting / over-fitting / variogram misspecification from CV residuals
Apply LOO-CV as the primary internal validation tool for any kriging deployment

You've built a variogram (Parts 3–4) and set up a kriging system (Part 5). The next question: is the model any good? LEAVE-ONE-OUT CROSS-VALIDATION (LOO-CV) provides the standard internal validation: hold each data point out, predict it from the rest using the assumed variogram, compute residuals. The residuals reveal the model's bias, accuracy, and uncertainty calibration.

The LOO-CV procedure

For each data point $i$ : temporarily REMOVE it from the dataset.
Use the remaining N - 1 points and the assumed variogram to predict $\hat{z}_i$ via ordinary kriging.
Compute the residual $r_i = z_i - \hat{z}_i$ .
Repeat for all N points.

The collection of N residuals provides multiple validation diagnostics: Mean Error (bias), RMSE/MAE (accuracy), and the variance ratio (kriging-variance calibration).

Diagnostic 1: Mean Error (ME) — unbiasedness

\text{ME} = \frac{1}{N} \sum_i (z_i - \hat{z}_i).

For an UNBIASED predictor, ME should be approximately 0. Systematic positive ME indicates the kriging is UNDER-PREDICTING the data; negative ME, OVER-PREDICTING. Often this indicates a wrongly-assumed mean (for simple kriging) or a mis-specified drift.

Diagnostic 2: RMSE and MAE — accuracy

\text{RMSE} = \sqrt{\frac{1}{N} \sum_i (z_i - \hat{z}_i)^2}, \quad \text{MAE} = \frac{1}{N} \sum_i |z_i - \hat{z}_i|.

Both quantify prediction accuracy. RMSE is more sensitive to large residuals; MAE more robust to outliers. Compare across competing variogram models: smaller RMSE/MAE = better fit.

Diagnostic 3: Variance ratio — uncertainty calibration

The CRITICAL geostat diagnostic. The kriging variance $\sigma_K^2(x_0)$ at each predicted point quantifies the model's claimed uncertainty. If the variogram is well-specified, the SAMPLE residuals should have variance approximately equal to the average kriging variance:

\text{Variance ratio} = \frac{\frac{1}{N} \sum r_i^2}{\frac{1}{N} \sum \sigma_K^2(x_i)} \approx 1.

If ratio >> 1: residuals are LARGER than the kriging variance says — the model UNDER-STATES uncertainty (over-confidence). Variogram range may be too short; sill may be too low; nugget may be too small.

If ratio << 1: residuals are SMALLER than predicted — the model OVER-STATES uncertainty (under-confidence). Variogram parameters typically too generous.

Modern best practice: aim for ratio in [0.8, 1.2]. Larger deviations warrant variogram revision.

Conditional bias

Scatter predicted vs actual. Under perfect prediction, the cloud lies on the diagonal. SYSTEMATIC departures from the diagonal reveal CONDITIONAL BIAS:

Compressed range: predictions less variable than data — kriging smooths too aggressively. Typically variogram nugget is too high.
Expanded range: predictions more variable than data — kriging over-extrapolates. Rare; usually only with too-short ranges.
Slope departure: regression of actual on predicted should have slope 1. Slope < 1 means smoothing.

Interpreting CV results

Symptom	Likely cause	Fix
ME ≠ 0	Mean assumption (SK) or drift (UK) wrong	Use OK or re-specify drift
RMSE high	Variogram poorly fit OR no spatial signal	Refit variogram, try more flexible models
Var ratio >> 1	Variogram parameters too low	Increase range or sill, add nugget
Var ratio << 1	Variogram parameters too high	Decrease range or sill
Conditional bias	Smoothing (low slope of actual~pred)	Reduce nugget

Try it

Defaults: N = 30, range = 5.0, nugget = 0. The scatter shows predicted vs actual; residuals vs predicted. Variance ratio near 1 (well-calibrated).
Drop range to 1.0. Now the assumed variogram is TOO SHORT — kriging believes points are nearly uncorrelated. Variance ratio explodes (residuals larger than predicted variance) — under-stated uncertainty.
Crank range to 15.0. Variogram is TOO LONG — kriging over-confident in extending influence too far. Residuals smaller than predicted variance.
Add nugget 0.5. Effectively smoothes the prediction; conditional bias becomes visible (predicted range compressed vs actual).
Crank N from 30 to 80. With more data, kriging predictions improve uniformly; variance ratio stabilises closer to 1 (better calibrated).

A LOO-CV reports ME = 0.0, RMSE = 1.2, variance ratio = 2.5. Diagnose the issue and recommend a fix.

What you now know

LOO-CV is the gold-standard internal validation: remove each data point, predict from rest, compute residuals. The MUST-REPORT diagnostics: ME (bias), RMSE/MAE (accuracy), variance ratio (kriging-variance calibration). Modern geostat best practice: any kriging deployment without LOO-CV results is incomplete. §6.2 next: accuracy plots and reliability diagnostics — the geostat-specific reliability diagram for kriging-variance calibration.

References

Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford. (Classical reference for cross-validation diagnostics.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford. (Comprehensive treatment of CV diagnostics for various kriging variants.)
Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling, 2nd ed. Oxford.
Deutsch, C.V., Journel, A.G. (1998). GSLIB, 2nd ed. Oxford. (LOO-CV implementation details.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd ed. Wiley.