The kriging variance — what it means and what it doesn't

Part 5 — Kriging

Learning objectives

State precisely WHAT σ²_K IS. The kriging variance σ²_K(s₀) is the MEAN SQUARED ERROR of the kriging estimator at $\mathbf{s}_0$ — $\sigma_K^2(\mathbf{s}_0) = E[(Z^*(\mathbf{s}_0) - Z(\mathbf{s}_0))^2]$ — UNDER the assumed variogram model and the kriging variant's mean assumption. It quantifies the estimation uncertainty due to SPATIAL SEPARATION from data: low near samples (where information is dense), high far from samples (where the predictor has no leverage). The full formula across variants is $\sigma_K^2(\mathbf{s}_0) = C(0) - \mathbf{w}^\top \mathbf{k} + \text{(Lagrange terms)}$ — zero terms for SK, $\mu$ for OK, $\boldsymbol\mu^\top \mathbf{f}_0$ for UK / KED.
Diagnose the CRUCIAL PROPERTY: σ²_K is a function of GEOMETRY (sample locations + query location) + VARIOGRAM (c₀, c, a, anisotropy axes) ONLY — NOT of the data VALUES. Two datasets with identical sample locations but completely different z_i's produce IDENTICAL σ²_K MAPS. This data-independence makes the variance a PROSPECTIVE TOOL: you can plan a sampling campaign by computing the σ²_K map of the *proposed* layout before any measurements are taken, then add samples where σ²_K is large. The kriging-variance widgets in §5.1 demonstrate this — sliding the known mean μ in simple kriging slides the estimate map but leaves the variance map untouched.
List what σ²_K is NOT. (a) NOT a true CONFIDENCE INTERVAL: the standardised residual $(Z(\mathbf{s}_0) - Z^*(\mathbf{s}_0)) / \sigma_K(\mathbf{s}_0)$ is approximately $\mathcal{N}(0, 1)$ ONLY UNDER the MULTI-GAUSSIAN assumption with the CORRECT variogram model. Real data are rarely multi-Gaussian (skewed marginals, heavy tails, bimodal mixtures) and variograms are estimated with uncertainty. (b) NOT a measure of MODEL UNCERTAINTY: σ²_K assumes the variogram is fixed and CORRECT. Variogram parameters $(c_0, c, a)$ are themselves estimated from finite data — that uncertainty does NOT appear in σ²_K. (c) NOT SUPPORT-AWARE: a POINT estimate's variance is structurally different from a BLOCK AVERAGE variance — preview §5.5 (block kriging) for how the variance shrinks with averaging support. (d) NOT a map of WHERE you're WRONG — only of HOW UNCERTAIN you should be on average. The actual error at a specific location may be much smaller or much larger than σ_K(s₀).
Identify the HONEST INTERPRETATION. σ²_K is best read as a RELATIVE-UNCERTAINTY MAP — useful for SAMPLING DESIGN (where to add the next sample), for COMPARING TWO KRIGING CONFIGURATIONS (which layout has lower aggregate variance), and for IDENTIFYING DATA GAPS. It is not appropriate to use σ²_K directly to compute P10 / P50 / P90 percentiles of the predicted value distribution without simulation. Pyrcz & Deutsch 2014 §4.7 and Goovaerts 1997 §4.7 both emphasise this distinction explicitly: "σ²_K is a relative ranking, not an absolute probability."
Apply the CALIBRATION CHECK via LEAVE-ONE-OUT CROSS-VALIDATION. For each sample $i$ : refit the kriging system on the other $N-1$ samples; compute $\hat z_{-i} = Z^*_{OK}(\mathbf{s}_i)$ and $\hat\sigma_{-i} = \sigma_K(\mathbf{s}_i)$ ; form the STANDARDISED RESIDUAL $u_i = (z_i - \hat z_{-i}) / \hat\sigma_{-i}$ . Compute the empirical statistics $\bar u = \frac{1}{N} \sum u_i$ and $s_u = \sqrt{\frac{1}{N-1} \sum (u_i - \bar u)^2}$ . Under MULTI-GAUSSIAN data + correct variogram, $u_i \sim \mathcal{N}(0, 1)$ , so $\bar u \approx 0$ and $s_u \approx 1$ .
Interpret the CALIBRATION DIAGNOSTICS. $\bar u \approx 0$ confirms UNBIASEDNESS (OK enforces it analytically for stationary fields). $s_u \approx 1$ confirms VARIANCE CALIBRATION. (a) $s_u > 1$ → the variogram UNDERESTIMATES variance (typically: assumed sill or range too small; missing nugget; under-fit). The σ²_K map is OPTIMISTIC — actual errors exceed what the variance promises. Sampling-design decisions based on an optimistic σ²_K under-recommend infill in problem regions. (b) $s_u < 1$ → the variogram OVERESTIMATES variance (typically: assumed sill or range too large; over-fit). The σ²_K map is PESSIMISTIC — actual errors are smaller than the variance suggests. Sampling-design decisions over-recommend sampling. Goovaerts 1997 §4.7.3 documents this as the canonical diagnostic; Chilès & Delfiner 2012 §3.4.4 develops the $\chi^2$ -style hypothesis test.
Be honest about the FAILURE MODE — non-Gaussian marginals. Even with a perfectly fit variogram, if the data are LOGNORMAL, BIMODAL, or otherwise non-Gaussian, the standardised residuals $u_i$ may have the right MEAN and right SD but the WRONG SHAPE. The N(0,1) interpretation of $\sigma_K$ as a one-sigma error bar fails. The principled remedy (Pyrcz & Deutsch 2014 §6; Deutsch & Journel 1998 §V.1) is: (i) apply an N-SCORE TRANSFORM (§1.2) to transform the data marginal to standard-normal; (ii) compute the variogram on the N-score variable; (iii) krige in N-score space; (iv) for uncertainty quantification, use SEQUENTIAL GAUSSIAN SIMULATION (Part 7) rather than relying on the back-transformed σ²_K.
List the ALTERNATIVE UNCERTAINTY QUANTIFICATIONS that replace σ²_K when its assumptions break. (a) CONDITIONAL SIMULATION (preview Part 7): instead of computing the kriging variance map analytically, simulate MANY EQUIPROBABLE REALISATIONS $\{Z^{(\ell)}(\mathbf{s})\}_{\ell=1}^L$ each consistent with the data at sample locations and the assumed variogram. The empirical SPREAD across realisations at $\mathbf{s}_0$ — its variance, its 10th and 90th percentile — IS the uncertainty quantification. Conditional simulation handles non-Gaussian marginals via Gaussian simulation in N-score space + back-transform. (b) BAYESIAN KRIGING (advanced): treat the variogram parameters $(c_0, c, a)$ as RANDOM with prior distributions, propagate their uncertainty through the kriging system. Yields posterior distributions for both Z(s₀) and the variogram. See Diggle & Ribeiro 2007 ch. 3-4 for the model-based geostatistics framework.
Apply the PRACTICAL USE GUIDELINES. (a) USE σ²_K for SAMPLING DESIGN — where to add the next K samples to most reduce aggregate uncertainty. Just compute σ²_K(s) on the proposed augmented layout (current samples + candidates) and pick the candidates with the largest σ²_K-reduction. (b) USE σ²_K for COMPARING KRIGING CONFIGURATIONS — which neighbourhood radius, which variogram model, which kriging variant gives more confident predictions at a given target. (c) USE σ²_K as a GUARDRAIL against extrapolation — UK / KED's σ²_K explodes outside the convex hull of the samples; cap the predictions there. (d) AVOID using σ²_K directly to compute P10/P50/P90 — for that, use conditional simulation.
Recognise the CANONICAL FAILURE MODES. (i) Apparent under-coverage on heavy-tailed data despite a well-fit variogram — symptom of marginal-distribution mismatch, not variogram mis-specification. (ii) σ²_K = 0 exactly at every sample (without nugget) but the cross-validation $s_u$ deviates from 1 — the assumed variogram is wrong even though the surface honours the data. (iii) σ²_K explodes at the boundary in UK / KED — basis values $f_k(\mathbf{s}_0)$ extrapolate outside the data envelope and the Lagrange-multiplier terms blow up. (iv) σ²_K nearly identical across two competing variograms but cross-validation $s_u$ very different — the variance MAP shape is robust to small variogram changes; the calibrated SCALE is not.
Apply the §5.4 widgets. The first widget (kriging-variance-anatomy) shows a 2-D field with samples and lets the reader drag a query and add / remove samples. Shuffling z confirms the variance map is independent of values. The second widget (variance-calibration-check) generates a synthetic field, runs leave-one-out CV under chosen assumed variogram (correct / under-fit / over-fit), and shows the standardised-residual histogram with N(0,1) overlay plus σ_K-vs-|error| scatter. Reader observes: SD ≈ 1 under correct; SD > 1 under under-fit; SD < 1 under over-fit; non-Gaussian marginals can break SHAPE even with correct variance.
Connect §5.4 to the broader PART 5 architecture and PART 6 / PART 7 PREVIEWS. The interpretation developed here applies UNIFORMLY across SK, OK, UK, KED: every variant produces a σ²_K(s₀) that lives in the same conceptual space — a relative-uncertainty map of GEOMETRY + VARIOGRAM. §5.5 (block kriging) modifies the right-hand side $\mathbf{k}$ to a point-to-block AVERAGE covariance vector, so the variance is a BLOCK MSE — different scale, same interpretation. §5.6 (neighbourhood selection) shows how restricting the data to a local window CHANGES σ²_K — a tight neighbourhood produces a LOCALLY-STATIONARY variance map at the cost of LESS-PRECISE estimates. Part 6 (uncertainty quantification) develops the calibration apparatus more rigorously; Part 7 (geostatistical simulation) develops the simulation-based alternative when σ²_K's Gaussianity assumption breaks.

§5.1, §5.2, and §5.3 each ended by computing the kriging variance $\sigma_K^2(\mathbf{s}$ for their respective variant — SK with its $\sigma$ {SK}^2 = C(0) - \mathbf{w}^\top \mathbf{k} $σ_{S K}^{2} = C (0) - w^{⊤} k$ , OK with the $+ \mu$ correction, UK / KED with the $+ \boldsymbol\mu^\top \mathbf{f}_0$ block. Three formulas, one common pattern. The formulas are clean; the INTERPRETATION is where most working geostatisticians stumble.

The classic trap is to read $\sigma_K^2$ as if it were a confidence interval. "The kriged estimate at $\mathbf{s}_0$ is 0.72 with kriging variance 0.04 — so the value lies between 0.32 and 1.12 with 95% confidence." That sentence is right ONLY if the field is multi-Gaussian and the variogram is correctly specified. In production geostatistics — mining grade control, hydrocarbon reservoir characterisation, environmental site assessment — at least one of those two conditions is usually violated. The kriging variance is still useful, but as a RELATIVE UNCERTAINTY MAP, not as an absolute probability statement.

§5.4 makes the interpretation explicit. We state what $\sigma_K^2$ IS (the MSE under the model), what it is NOT (a true CI, a measure of model uncertainty, a support-aware quantity, a "where am I wrong" map), and how to TEST whether it is calibrated (leave-one-out cross-validation; standardised residuals; N(0, 1) reference). We close by previewing the principled alternative when calibration fails: sequential Gaussian simulation (Part 7).

What σ²_K is — the MSE under the model

The kriging predictor $Z^*(\mathbf{s}_0)$ is the LINEAR predictor that minimises the mean squared error subject to the variant's unbiasedness constraint. The MINIMISED MSE is the kriging variance:

\boxed{\;\sigma_K^2(\mathbf{s}_0) \;=\; E\bigl[(Z^*(\mathbf{s}_0) - Z(\mathbf{s}_0))^2\bigr] \;=\; C(0) \,-\, \mathbf{w}^\top \mathbf{k} \,+\, \text{(Lagrange terms)}.\;}

The Lagrange terms differ across the variants: 0 for SK, $\mu \cdot 1$ for OK, $\sum_k \mu_k f_k(\mathbf{s}$ for UK / KED. They are non-negative on aggregate — each new constraint costs variance because it removes a degree of freedom from the unconstrained optimum. The HIERARCHY $\sigma$ {SK}^2 \le \sigma_{OK}^2 \le \sigma_{UK}^2 $σ_{S K}^{2} \leq σ_{O K}^{2} \leq σ_{U K}^{2}$ holds at every target: assuming you know more (a known mean for SK; a constant unknown mean for OK; a polynomial-trend mean for UK) gives you a LOWER kriging variance.

The formula has THREE structural ingredients: $C(0)$ (the variance of the field — fixed by the variogram), $\mathbf{w}^\top \mathbf{k}$ (the data-leverage term — large when the samples are close to the target with respect to the variogram), and the Lagrange aggregate (the unbiasedness penalty). All three depend on $(c_0, c, a)$ and on the GEOMETRY of ${\mathbf{s}_i} \cup {\mathbf{s}_0}$ . None of them depends on the sample VALUES ${z_i}$ .

σ²_K is INDEPENDENT of the data values — the geometry property

This is the load-bearing teaching point of §5.4. Inspect the OK system $\mathbf{w} = \mathbf{A}^{-1} \mathbf{b}$ where $\mathbf{A}$ is the augmented covariance matrix (depends on ${\mathbf{s}_i}$ and the variogram) and $\mathbf{b}$ is the augmented sample-to-target covariance vector (depends on ${\mathbf{s}_i}$ , $\mathbf{s}_0$ , and the variogram). NEITHER matrix uses the data ${z_i}$ . The kriging weights are functions of GEOMETRY + VARIOGRAM. The kriging variance $\sigma_K^2 = C(0) - \mathbf{w}^\top \mathbf{k} + \mu$ uses the weights and the covariance vector — again NO data values.

Practical consequence: TWO datasets with the SAME sample locations but DIFFERENT z-values produce IDENTICAL σ²_K maps. A clean dataset and a corrupted-noise version of itself have the same variance map. The variance map is a property of the EXPERIMENTAL DESIGN (where you put the samples), not of what you found there.

The first widget (above) shows this directly. The variance heatmap (left) responds when you move samples, add new ones, remove existing ones, or change the variogram. It does NOT respond when you shuffle the sample VALUES (the "Shuffle z" button — same locations, new seed). The estimate map (right) does respond to shuffling. This is the geometric character of σ²_K made tangible.

The data-independence makes σ²_K a PROSPECTIVE tool. You can compute σ²_K for a PROPOSED sample layout BEFORE any measurements are taken: pick candidate sample locations, compute the variance map under the assumed variogram, pick the candidates that most reduce aggregate variance. This is the foundation of OPTIMAL SAMPLING DESIGN (van Groenigen 2000 Comput. Geosci.; Pyrcz & Deutsch 2014 §4.7.4): minimise an objective like the average σ²_K over the domain, the maximum σ²_K, or a weighted combination, subject to a budget constraint on the number of samples.

What σ²_K is NOT — four common misreadings

Not a true confidence interval. The ratio $(Z(\mathbf{s}_0) - Z^*(\mathbf{s}_0)) / \sigma_K(\mathbf{s}_0)$ is approximately $\mathcal{N}(0, 1)$ ONLY IF the data are MULTI-GAUSSIAN and the variogram is CORRECTLY specified. "Z(s₀) ± 1.96 σ_K(s₀) is a 95% CI" requires both conditions. Real geo-data often have skewed marginals (porosity, permeability, ore grade, soil-pollution concentration — all positively skewed), heavy tails, or bimodal mixtures (sand vs shale lithology). Even with a perfect variogram, the standardised residuals do not have a Gaussian shape on such data, so the 1.96 multiplier is wrong.

Not a measure of model uncertainty. σ²_K assumes the variogram is FIXED and CORRECT. The variogram itself was estimated from a finite empirical variogram + a fitted parametric model — that estimation process has uncertainty. σ²_K does NOT include the variogram-fit uncertainty. Bayesian model-based geostatistics (Diggle & Ribeiro 2007) does propagate it, at the cost of posterior MCMC over variogram parameters. In production kriging, the standard remedy is variogram-sensitivity analysis: re-run the kriging with c₀ ± 20%, c ± 30%, a ± 25%, and check the σ²_K map's sensitivity.

Not support-aware. The kriging variance computed for a POINT target $\mathbf{s}_0$ is structurally different from the variance for a BLOCK (volume average) target. Block kriging (preview §5.5) modifies the right-hand side to use point-to-block AVERAGE covariances — the variance shrinks because averaging reduces noise. The point and block variances measure DIFFERENT quantities. Confusing them — reporting a point σ²_K and then drawing block-scale conclusions — is a common error.

Not a "where am I wrong" map. σ²_K measures HOW UNCERTAIN you should be on average — it does not tell you that the ACTUAL ERROR at $\mathbf{s}_0$ is large. A specific location can have $\sigma_K^2(\mathbf{s}_0)$ small and yet the realised error $|Z(\mathbf{s}_0) - Z^*(\mathbf{s}_0)|$ may be unusually large (a "tail event"); or σ²_K(s₀) can be large and yet the realised error small. σ²_K predicts the AVERAGE size of the error over many realisations, not the error at any one realisation.

The honest interpretation — relative uncertainty map

What CAN you do with σ²_K? Treat it as a RELATIVE-UNCERTAINTY MAP. The relative ordering of σ²_K values is robust to the assumptions that the absolute calibration violates. Specifically:

Use σ²_K to compare LOCATIONS: if σ²_K(s_A) < σ²_K(s_B), location A is more confidently estimated than B — even if neither value is calibrated to a Gaussian CI. This relative ordering is what drives optimal sampling design.
Use σ²_K to compare KRIGING CONFIGURATIONS: two competing neighbourhood radii, two competing variogram model choices, two competing kriging variants — the one with lower aggregate σ²_K (mean over the target domain) is the more confident choice.
Use σ²_K as a GUARDRAIL against extrapolation: UK / KED's σ²_K explodes outside the convex hull of the samples; flag those targets in the output map.
Don't use σ²_K directly for P10 / P50 / P90: for percentile-based uncertainty quantification, use conditional simulation (Part 7), which honours the data marginal AND the variogram and propagates both into the percentile estimates.

This is the Goovaerts 1997 §4.7 / Pyrcz & Deutsch 2014 §4.7 / Chilès & Delfiner 2012 §3.4 honest framing. σ²_K is a tool, and like every tool it has a domain of validity. Use it within that domain (sampling design, configuration comparison, extrapolation guardrails) and reach for alternatives (simulation, Bayesian) when you push past it.

The calibration check — leave-one-out cross-validation

Even within its domain of validity, σ²_K rests on the variogram being correctly specified. The calibration check tests this. The standard procedure (Cressie 1993 §2.6.4; Goovaerts 1997 §4.7.3):

For each sample $i = 1, \ldots, N$ : refit the kriging system on the OTHER $N - 1$ samples (leave-one-out, LOO).
Compute the LOO prediction $\hat z_{-i} = Z^*_{OK}(\mathbf{s}$ and the LOO kriging standard deviation $\hat\sigma$ {-i} = \sqrt{\sigma_K^2(\mathbf{s}_i; \text{data} \setminus i)} $\overset{σ}{^}_{- i} = σ_{K}^{2} (s_{i}; data ∖ i)$ .
Form the STANDARDISED RESIDUAL $u_i = (z_i - \hat z_{-i}) / \hat\sigma_{-i}$ .
Compute the empirical mean $\bar u = \frac{1}{N} \sum_i u_i$ and standard deviation $s_u = \sqrt{\frac{1}{N - 1} \sum_i (u_i - \bar u)^2}$ .
Under MULTI-GAUSSIAN data + CORRECT variogram: $u_i \sim \mathcal{N}(0, 1)$ , so we expect $\bar u \approx 0$ (with sampling noise $\pm 1/\sqrt{N}$ ) and $s_u \approx 1$ (with sampling noise $\pm \sqrt{2 / N}$ ).

The diagnostic outcomes:

$\bar u \approx 0$ — UNBIASEDNESS. OK enforces this analytically through $\sum w = 1$ ; it should always hold for stationary fields with OK.
$s_u \approx 1$ — CALIBRATION. σ²_K is the right SCALE of the prediction error.
$s_u > 1$ — variogram UNDERESTIMATES variance. Typically: assumed sill too small, range too narrow, missing nugget. The kriging variance map is OPTIMISTIC — actual errors exceed what σ²_K promises. Sampling-design conclusions drawn from this map UNDER-recommend infill in problem regions. Fit a larger sill / wider range / add nugget; re-check.
$s_u < 1$ — variogram OVERESTIMATES variance. Typically: assumed sill too large, range too wide. The σ²_K map is PESSIMISTIC — actual errors are smaller than the variance suggests. Sampling-design conclusions OVER-recommend sampling. Fit a smaller sill / narrower range; re-check.

The Chilès & Delfiner 2012 §3.4.4 $\chi^2$ -test formalises this: under correct variogram and multi-Gaussian data, $(N - 1) s_u^2 / 1^2 \sim \chi^2_{N-1}$ . With $N = 30$ samples, the 95% interval for $s_u$ is approximately $[0.77, 1.28]$ ; with $N = 100$ , approximately $[0.86, 1.16]$ . Inference from small $N$ is noisy — a single LOO calibration that comes in at $s_u = 0.85$ might just be sampling fluctuation, not evidence of over-fit.

The second widget (above) makes this concrete. A 40-sample Gaussian field is generated on the unit square under a TRUE Spherical variogram. The reader picks an ASSUMED variogram — Correct, Under-fit (too small + too narrow), or Over-fit (too large + too wide) — and the widget runs LOO cross-validation, plots the histogram of $u_i$ with the $\mathcal{N}(0, 1)$ reference overlaid, and reports $\bar u$ and $s_u$ . The diagnostic pills turn ORANGE when $s_u$ is outside $[0.85, 1.15]$ . Reshuffling the seed shows the sampling noise band — a single value of $s_u$ near 1 is consistent with correctness, but values consistently above 1.15 or below 0.85 across reshuffles signal a real miscalibration.

The DATA DISTRIBUTION picker (Multi-Gaussian / Lognormal-skew / Bimodal) shows the OTHER failure mode: even with the CORRECT variogram, non-Gaussian data can break the $\mathcal{N}(0, 1)$ interpretation of standardised residuals. The HISTOGRAM shape departs from the bell curve — even if $\bar u$ and $s_u$ are close to (0, 1), the tails over- or under-cover relative to a Gaussian. This is the cue to switch to N-score transform + sequential Gaussian simulation (Part 7).

Honest failure modes — when σ²_K misleads

Non-Gaussian marginals. Real geo-data are often LOGNORMAL (porosity, permeability, ore grade, contaminant concentration — all positively skewed). Even with a variogram that calibrates the SECOND moment correctly (so $s_u \approx 1$ ), the SHAPE of the standardised-residual distribution can be non-Gaussian, and the σ²_K-based one-sigma error bar covers more or less than the nominal 68%. The principled remedy: §1.2 N-SCORE TRANSFORM applied to the data, kriging in N-score space, back-transform with proper attention to the back-transform's bias. For full uncertainty quantification, use sequential Gaussian simulation (Part 7) rather than back-transformed σ²_K.

Variogram uncertainty. The variogram parameters $(c_0, c, a)$ are themselves uncertain — estimated from a finite empirical variogram with binning choices, lag-tolerance choices, and outlier sensitivities. σ²_K assumes the variogram is correct as given. Practical remedy: variogram-sensitivity analysis (rerun the kriging with $c_0 \pm 30%$ , $c \pm 30%$ , $a \pm 25%$ ; check the σ²_K map's sensitivity). Bayesian remedy: put priors on $(c_0, c, a)$ and propagate to the posterior via MCMC — see Diggle & Ribeiro 2007 ch. 3-4. Most production workflows do the sensitivity analysis, not the full Bayesian.

Extrapolation. UK / KED's σ²_K explodes outside the convex hull of the samples — the Lagrange-multiplier terms $\boldsymbol\mu^\top \mathbf{f}_0$ contain basis values $f_k(\mathbf{s}_0)$ that grow without bound for polynomial / external-drift bases when $\mathbf{s}_0$ moves outside the data envelope. The σ²_K growth IS the system's honest signal that the prediction is unreliable. Heed it. A practical convention: flag any target where $\sigma_K^2(\mathbf{s}_0) > 0.8 \cdot (c_0 + c)$ as "extrapolation — use with caution."

Sample-clustering effects. Highly clustered samples (e.g. all in one corner of the domain) produce σ²_K very low in the cluster and very high everywhere else. The empirical CV residuals from leave-one-out come predominantly from the cluster — they don't test the variance in the high-σ²_K gaps. The diagnostic SD value may suggest "well calibrated" while the global map is highly miscalibrated. Pyrcz & Deutsch 2014 §4.7.3 recommends declustered cross-validation (weight the residuals by the §2 cell-declustering weights) for clustered datasets — see also Goovaerts 1997 §4.6.

Variogram-honouring but kriging-failing. A subtle case: σ²_K = 0 EXACTLY at every sample (without nugget — kriging is an exact interpolator), so the predictor honours every data value perfectly. Yet leave-one-out $s_u$ can still deviate from 1 because the model is wrong AWAY from the samples. The on-sample test (σ²_K(s_i) = 0) is consistent with any variogram that produces an exact interpolator; only LOO actually probes the model.

Alternative uncertainty quantifications

When σ²_K's assumptions fail, three alternatives carry more honest uncertainty information.

Conditional simulation (preview Part 7). Instead of computing the kriging variance analytically, simulate MANY equiprobable realisations ${Z^{(\ell)}(\mathbf{s})}_{\ell=1}^L$ each consistent with the data at sample locations and with the assumed variogram. At any target $\mathbf{s}_0$ , the empirical SPREAD across $\ell$ — the variance, the 10th and 90th percentiles, the full empirical distribution — IS the uncertainty. Conditional simulation handles non-Gaussian marginals automatically: simulate in N-score space, back-transform each realisation, compute empirical percentiles in original units. The price: 100+ realisations needed (vs. one kriging solve) — but they ALSO give you the spatial DEPENDENCE structure of the uncertainty (e.g., "P90 of the regional total grade" requires the joint distribution, not just marginals at each cell).

Bayesian kriging (advanced). Place priors on the variogram parameters $(c_0, c, a)$ and the mean $m$ (or trend coefficients $\boldsymbol\beta$ ), and update them with the observed data via Bayes' theorem. The result is a POSTERIOR predictive distribution for $Z(\mathbf{s}_0)$ that propagates BOTH the kriging-internal uncertainty AND the variogram-parameter uncertainty. Implementations: GP regression in PyMC, Stan, or the geoR R package (Diggle & Ribeiro). Computationally heavier; offers the most honest UQ when variogram-parameter uncertainty matters (small $N$ , weakly identified variogram).

Variogram-ensemble kriging (pragmatic). Run the kriging with several plausible variograms (e.g. five alternative fits from a variogram-modelling workshop) and report the ENSEMBLE $\sigma_K^2$ as the variance of the kriged estimates across the ensemble, plus a baseline σ²_K from any one variogram. This gives a back-of-envelope measure of variogram-fit sensitivity without the full Bayesian apparatus. Used widely in production for "what would another variogram fit have given me?"

Practical use of σ²_K — yes, no, careful

Yes: sampling design. The relative ordering of σ²_K values is robust to the assumptions that the absolute calibration violates. Compute σ²_K(s) on the proposed augmented layout, pick candidates that most reduce aggregate σ²_K. This is the foundation of "infill drilling" decisions in mining and "adaptive sampling" in environmental remediation.

Yes: configuration comparison. Compare two kriging configurations (neighbourhood radius A vs B; variogram model M1 vs M2; OK vs UK; isotropic vs anisotropic) by their aggregate σ²_K over a domain of interest. The configuration with lower aggregate variance is more confidently predicting that domain.

Yes: extrapolation guardrail. UK / KED σ²_K explodes outside the data envelope. Use this as a built-in flag: anywhere σ²_K is greater than $0.8 \cdot (c_0 + c)$ , the prediction is barely better than the global mean — flag it as low-information.

Careful: absolute one-sigma bars. $Z^*(\mathbf{s}_0) \pm \sigma_K(\mathbf{s}_0)$ is a one-sigma band ONLY UNDER multi-Gaussian + correct variogram. Always cross-check with LOO $s_u$ . If $s_u$ deviates from 1, the band is the wrong scale; if data are non-Gaussian, the band has the wrong SHAPE.

No: P10 / P50 / P90 percentiles. Do not use $Z^* \pm 1.28 \sigma_K$ as P10 / P90 bounds. For percentile-based uncertainty, use conditional simulation.

No: comparing across variograms with different fits. If you tried two variograms with different fits and got different σ²_K maps, the σ²_K values are not on the same scale — calibrate each one (LOO) and report the scale next to the map.

§5.4 in the architecture of Part 5

§5.4 generalises across SK / OK / UK / KED — every variant's kriging variance lives in the same interpretive frame. The DATA-INDEPENDENCE property is universal; the CALIBRATION check via LOO standardised residuals applies uniformly; the MULTI-GAUSSIAN caveat applies wherever σ²_K is read as a CI; the EXTRAPOLATION explosion is most acute in UK / KED but visible in OK with high-order anisotropy too.

§5.5 (block kriging) modifies the right-hand side $\mathbf{k}$ to be a POINT-TO-BLOCK average covariance vector, so the variance is a BLOCK MSE — different scale, same interpretation. Critically, the SUPPORT EFFECT (block size affects variance) is a structural difference, not an interpretive one — §5.5 makes this explicit.

§5.6 (neighbourhood selection) interacts with σ²_K in a subtle way: a tight neighbourhood produces a LOCALLY-CALIBRATED variance map (the within-neighbourhood variogram dominates) but LESS-PRECISE estimates; a wide neighbourhood is the opposite. The neighbourhood-selection decision is fundamentally a σ²_K-vs-bias trade-off.

Part 6 (uncertainty quantification — geostatistics-wide) develops the calibration apparatus more rigorously, including the $\chi^2$ -test for $s_u$ , declustered cross-validation for sample-clustered datasets, and the joint variance/bias diagnostic.

Part 7 (geostatistical simulation) is the LANDING for everything §5.4 flagged. When σ²_K's assumptions fail — non-Gaussian data, variogram-parameter uncertainty matters, percentile-based UQ is needed — sequential Gaussian simulation is the principled replacement.

Try it

In kriging-variance-anatomy, click "Shuffle z (same locations)" several times. Watch the ESTIMATE map (right) change with every shuffle while the VARIANCE map (left) stays IDENTICAL. This is the data-independence of σ²_K made tangible.
In kriging-variance-anatomy, set nugget $c_0 = 0$ . Drag the query diamond to sit EXACTLY on a sample dot. Read the σ²_K pill: ≈ 0. Now raise the nugget to $c_0 = 0.20$ — query σ²_K rises to ≈ 0.20 even on a sample. With nugget, kriging is no longer an exact interpolator and the on-sample variance equals the nugget.
In kriging-variance-anatomy, click "Clear all" then place 3 samples in the top-left corner of the domain. Drag the query toward the bottom-right empty region. Read σ²_K — it approaches the sill $c_0 + c$ far from any sample. Add a cluster in the bottom-right and watch σ²_K drop there.
In kriging-variance-anatomy, lower the range $a$ to 0.10. Notice the variance map's "wells of low variance" shrink to small halos around each sample — short range means samples have little reach. Raise $a$ to 0.80 and the wells expand to cover the whole domain.
In variance-calibration-check, pick "Correct" assumed variogram + "Multi-Gaussian" data. Reshuffle the seed 5-10 times. Read SD(u): it hovers near 1 with sampling noise of order $\sqrt{2/N} \approx 0.22$ . This is the calibration band — values inside $[0.78, 1.22]$ at $N = 40$ are statistically consistent with a correct variogram.
In variance-calibration-check, switch to "Under-fit" assumed variogram. SD(u) jumps above 1 (typically 1.4 — 2.5). The variogram's assumed sill is too small; the actual residual spread exceeds what σ_K promises. Reshuffle a few times — the bias is systematic, not noise.
In variance-calibration-check, switch to "Over-fit" assumed variogram. SD(u) drops below 1 (typically 0.4 — 0.7). The σ²_K map is pessimistic; the kriging is over-confident in the wrong direction. Sampling-design decisions based on this map over-recommend infill drilling.
In variance-calibration-check, set "Correct" variogram but pick "Lognormal-skew" data. Look at the HISTOGRAM SHAPE: it departs from the $\mathcal{N}(0, 1)$ overlay even when SD(u) is near 1. This is the marginal-distribution failure mode that motivates §1.2 N-score transform + Part 7 sequential Gaussian simulation.
Without coding: an analyst kriges a soil-pH dataset (40 samples, lognormal-ish marginal) and reports "the kriged estimate at the unmeasured location is 6.3 ± 0.20 (one-sigma)." Their leave-one-out SD(u) = 0.92. Critique the report: where is the analyst's honest-uncertainty story weak, even though SD(u) is near 1?
Without coding: a mining engineer compares two variograms (one isotropic, one anisotropic with $a_{\max}/a_{\min} = 3$ ) on a vein-controlled ore deposit. The isotropic σ²_K map is smoother and lower on average; the anisotropic σ²_K map is rougher with high-variance zones across the vein. Which σ²_K map is more honest, and how would leave-one-out cross-validation tell you?

Pause and reflect: the kriging variance $\sigma_K^2(\mathbf{s}_0)$ depends only on GEOMETRY (sample locations + query location) + VARIOGRAM. Two datasets with identical layouts but completely different values have identical variance maps. Why does this PROPERTY make σ²_K a powerful PROSPECTIVE tool (you can map uncertainty before measurements are taken) and simultaneously a LIMITED RETROSPECTIVE tool (it cannot tell you WHERE the actual prediction errors are large in a SPECIFIC dataset)? How does this dual character constrain when σ²_K is the right diagnostic versus when conditional simulation is required?

What you now know — and the open of §5.5

You can STATE what σ²_K IS: the MSE of the kriging estimator under the assumed variogram model, with the formula $\sigma_K^2 = C(0) - \mathbf{w}^\top \mathbf{k} + (\text{Lagrange terms})$ . You know that the Lagrange terms differ across SK (zero), OK ( $\mu \cdot 1$ ), and UK / KED ( $\boldsymbol\mu^\top \mathbf{f}$ ), and that the hierarchy $\sigma$ {SK}^2 \le \sigma_{OK}^2 \le \sigma_{UK}^2 $σ_{S K}^{2} \leq σ_{O K}^{2} \leq σ_{U K}^{2}$ holds at every target.

You can DIAGNOSE the DATA-INDEPENDENCE property: σ²_K depends on geometry + variogram only, not on sample values. Two datasets with identical locations but different values produce identical variance maps. This makes σ²_K a prospective tool for sampling design.

You can RECOGNISE what σ²_K is NOT: not a true confidence interval (assumes multi-Gaussian + correct variogram), not a measure of model uncertainty (variogram is assumed correct), not support-aware (point variance ≠ block variance), not a "where am I wrong" map (predicts average uncertainty, not realised error).

You can APPLY the calibration check via leave-one-out cross-validation: standardised residuals $u_i = (z_i - \hat z_{-i})/\hat\sigma_{-i}$ ; $\bar u \approx 0$ confirms unbiasedness; $s_u \approx 1$ confirms variance calibration; $s_u > 1$ indicates under-fit (variance underestimated); $s_u < 1$ indicates over-fit (variance overestimated).

You can INTERPRET non-Gaussian marginals as a failure mode that can leave $\bar u$ and $s_u$ near (0, 1) while the SHAPE of the residual distribution departs from N(0, 1) — calling for §1.2 N-score transform + Part 7 sequential Gaussian simulation.

You can NAME the alternative uncertainty quantifications when σ²_K's assumptions break: conditional simulation (Part 7 — propagates non-Gaussianity and gives empirical percentiles), Bayesian kriging (Diggle & Ribeiro 2007 — propagates variogram-parameter uncertainty), and variogram-ensemble kriging (pragmatic — runs the kriging with several plausible variograms).

You can APPLY the practical-use guidelines: σ²_K is reliable for SAMPLING DESIGN, CONFIGURATION COMPARISON, and EXTRAPOLATION GUARDRAILS. It is unreliable as an absolute confidence interval without LOO calibration + multi-Gaussian check, and inappropriate for P10/P50/P90 without conditional simulation.

This OPENS §5.5 — block kriging and the change of support. The next section modifies the kriging target from a POINT $\mathbf{s}_0$ to a BLOCK (volume average) $Z_V = (1/|V|) \int_V Z(\mathbf{s}) d\mathbf{s}$ . The kriging system's right-hand side becomes a point-to-block AVERAGE covariance vector; the variance changes scale; the "support effect" — that block-average variances are SMALLER than point variances — becomes operational. The σ²_K interpretation developed in §5.4 carries over, with the additional caveat that POINT σ²_K and BLOCK σ²_K measure structurally different quantities.

References

Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8), 1246–1266. (The foundational paper. Defines the kriging variance $\sigma_K^2 = C(0) - \mathbf{w}^\top \mathbf{k}$ in the simple-kriging form, develops the constrained-MSE Lagrangian framework, and establishes the data-independence property — σ²_K depends on the variogram and the geometry alone. The reference for everything in §5.4.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (§2.6.4 develops the leave-one-out cross-validation diagnostic; §3.2 treats the kriging-variance interpretation at mathematical-statistics rigour. The reference textbook for both the MSE derivation and the calibration apparatus.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (§3.4-3.5 covers the kriging variance interpretation with detailed treatment of the multi-Gaussian assumption, the $\chi^2$ -distribution of standardised residuals under correct specification, and the failure modes. §3.4.4 specifically develops the calibration hypothesis test. The standard graduate-school reference for §5.4 material.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§4.7 develops the cross-validation diagnostics from a practitioner perspective with worked examples on the WL dataset. §4.7.3 specifically treats the standardised-residual calibration: "if the SD of standardised residuals departs from 1, the variogram is mis-fit." The practitioner-textbook reference at graduate-school level.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (§13-14 develops the kriging-variance interpretation at the entry-level pedagogy. The treatment of why σ²_K is independent of the data is particularly clear and is the source for the "trust map" framing.)
Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide (2nd ed.). Oxford University Press. (§IV.1.10 documents the kt3d program's leave-one-out cross-validation output. The §5.4 calibration widget implements the same diagnostic at smaller scale.)
Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling (2nd ed.). Oxford University Press. (§4.7 develops the kriging-variance interpretation in the production-reservoir-characterisation context. Particularly explicit on the "σ²_K is a relative ranking, not an absolute probability" framing. §6 develops the N-score transform + sequential Gaussian simulation alternative for non-Gaussian data.)
Diggle, P.J., Ribeiro, P.J. Jr. (2007). Model-based Geostatistics. Springer. (Chapter 3 develops the Bayesian model-based geostatistics framework with proper priors on variogram parameters. The reference for propagating variogram-parameter uncertainty into the predictive distribution — the principled alternative when σ²_K's "variogram-is-correct" assumption is too strong.)
Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer. (The rigorous theoretical-statistics treatment of the kriging variance. Develops the asymptotic calibration theory under increasing-domain and infill asymptotics, the misspecified-variogram bias of σ²_K, and the connection between LOO cross-validation and the $\chi^2$ -distribution.)
van Groenigen, J.W. (2000). The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Computers & Geosciences, 26(7), 951–964. (The reference paper for SAMPLING DESIGN based on σ²_K. Develops the optimisation framework: pick the next K sample locations to minimise the aggregate σ²_K over a domain. The practical application of σ²_K's data-independence property.)
Wackernagel, H. (2003). Multivariate Geostatistics (3rd ed.). Springer. (Chapter 11 develops the kriging-variance interpretation within the multivariate framework. Useful for seeing how σ²_K extends to cokriging — §5.7 — and to the multivariate UQ that combines point variance with cross-variable covariance.)