Leave-one-out cross-validation for kriging
Learning objectives
- Define LEAVE-ONE-OUT CROSS-VALIDATION for geostatistical models
- Compute the standard diagnostics: ME, RMSE, MAE, conditional bias
- Check whether the assumed VARIOGRAM gives WELL-CALIBRATED kriging variance via the variance ratio
- Diagnose under-fitting / over-fitting / variogram misspecification from CV residuals
- Apply LOO-CV as the primary internal validation tool for any kriging deployment
You've built a variogram (Parts 3–4) and set up a kriging system (Part 5). The next question: is the model any good? LEAVE-ONE-OUT CROSS-VALIDATION (LOO-CV) provides the standard internal validation: hold each data point out, predict it from the rest using the assumed variogram, compute residuals. The residuals reveal the model's bias, accuracy, and uncertainty calibration.
The LOO-CV procedure
- For each data point : temporarily REMOVE it from the dataset.
- Use the remaining N - 1 points and the assumed variogram to predict via ordinary kriging.
- Compute the residual .
- Repeat for all N points.
The collection of N residuals provides multiple validation diagnostics: Mean Error (bias), RMSE/MAE (accuracy), and the variance ratio (kriging-variance calibration).
Diagnostic 1: Mean Error (ME) — unbiasedness
For an UNBIASED predictor, ME should be approximately 0. Systematic positive ME indicates the kriging is UNDER-PREDICTING the data; negative ME, OVER-PREDICTING. Often this indicates a wrongly-assumed mean (for simple kriging) or a mis-specified drift.
Diagnostic 2: RMSE and MAE — accuracy
Both quantify prediction accuracy. RMSE is more sensitive to large residuals; MAE more robust to outliers. Compare across competing variogram models: smaller RMSE/MAE = better fit.
Diagnostic 3: Variance ratio — uncertainty calibration
The CRITICAL geostat diagnostic. The kriging variance at each predicted point quantifies the model's claimed uncertainty. If the variogram is well-specified, the SAMPLE residuals should have variance approximately equal to the average kriging variance:
If ratio >> 1: residuals are LARGER than the kriging variance says — the model UNDER-STATES uncertainty (over-confidence). Variogram range may be too short; sill may be too low; nugget may be too small.
If ratio << 1: residuals are SMALLER than predicted — the model OVER-STATES uncertainty (under-confidence). Variogram parameters typically too generous.
Modern best practice: aim for ratio in [0.8, 1.2]. Larger deviations warrant variogram revision.
Conditional bias
Scatter predicted vs actual. Under perfect prediction, the cloud lies on the diagonal. SYSTEMATIC departures from the diagonal reveal CONDITIONAL BIAS:
- Compressed range: predictions less variable than data — kriging smooths too aggressively. Typically variogram nugget is too high.
- Expanded range: predictions more variable than data — kriging over-extrapolates. Rare; usually only with too-short ranges.
- Slope departure: regression of actual on predicted should have slope 1. Slope < 1 means smoothing.
Interpreting CV results
| Symptom | Likely cause | Fix |
|---|---|---|
| ME ≠ 0 | Mean assumption (SK) or drift (UK) wrong | Use OK or re-specify drift |
| RMSE high | Variogram poorly fit OR no spatial signal | Refit variogram, try more flexible models |
| Var ratio >> 1 | Variogram parameters too low | Increase range or sill, add nugget |
| Var ratio << 1 | Variogram parameters too high | Decrease range or sill |
| Conditional bias | Smoothing (low slope of actual~pred) | Reduce nugget |
Try it
- Defaults: N = 30, range = 5.0, nugget = 0. The scatter shows predicted vs actual; residuals vs predicted. Variance ratio near 1 (well-calibrated).
- Drop range to 1.0. Now the assumed variogram is TOO SHORT — kriging believes points are nearly uncorrelated. Variance ratio explodes (residuals larger than predicted variance) — under-stated uncertainty.
- Crank range to 15.0. Variogram is TOO LONG — kriging over-confident in extending influence too far. Residuals smaller than predicted variance.
- Add nugget 0.5. Effectively smoothes the prediction; conditional bias becomes visible (predicted range compressed vs actual).
- Crank N from 30 to 80. With more data, kriging predictions improve uniformly; variance ratio stabilises closer to 1 (better calibrated).
A LOO-CV reports ME = 0.0, RMSE = 1.2, variance ratio = 2.5. Diagnose the issue and recommend a fix.
What you now know
LOO-CV is the gold-standard internal validation: remove each data point, predict from rest, compute residuals. The MUST-REPORT diagnostics: ME (bias), RMSE/MAE (accuracy), variance ratio (kriging-variance calibration). Modern geostat best practice: any kriging deployment without LOO-CV results is incomplete. §6.2 next: accuracy plots and reliability diagnostics — the geostat-specific reliability diagram for kriging-variance calibration.
References
- Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford. (Classical reference for cross-validation diagnostics.)
- Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford. (Comprehensive treatment of CV diagnostics for various kriging variants.)
- Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling, 2nd ed. Oxford.
- Deutsch, C.V., Journel, A.G. (1998). GSLIB, 2nd ed. Oxford. (LOO-CV implementation details.)
- Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd ed. Wiley.