Ordinary kriging and the unbiasedness constraint

Part 5 — Kriging

Learning objectives

Diagnose the WEAKNESS of simple kriging: it requires $m = E[Z(\mathbf{s})]$ to be KNOWN a priori. In practice, $m$ is almost never known — we ESTIMATE it (from the §1.4-§1.5 declustered mean, from a regional reference, from prior studies) and then PROMOTE the estimate to a known constant for kriging. The kriging variance $\sigma_K^2$ in §5.1 does NOT propagate the uncertainty of that estimate. It therefore UNDER-ESTIMATES the true prediction error whenever the mean had to be estimated rather than given. This under-estimate is small when the data are abundant and the variogram is well-fit, large when either fails. Ordinary kriging fixes this systematically by changing the unbiasedness constraint.
State the ORDINARY KRIGING assumption: the field mean $m = E[Z(\mathbf{s})]$ is CONSTANT across the kriging neighbourhood, but its VALUE IS UNKNOWN. This is strictly weaker than the simple-kriging assumption (which fixes a specific value of $m$ ) and strictly stronger than the universal-kriging assumption (§5.3, which allows $m$ to be a basis-function trend). OK is the right model when you have no reliable prior for $m$ but you're confident the data come from a region where the mean is reasonably stationary. In practice OK works with LOCAL neighbourhoods (§5.6), so the stationarity requirement is local, not global — a major reason OK works well even on fields with subtle, slowly-varying trends.
Impose the UNBIASEDNESS CONSTRAINT directly on the linear predictor $Z^*(\mathbf{s}_0) = \sum_{i=1}^N w_i Z(\mathbf{s}_i)$ . Take the expectation: $E[Z^*(\mathbf{s}_0)] = \sum_i w_i E[Z(\mathbf{s}_i)] = m \sum_i w_i$ . For this to equal $E[Z(\mathbf{s}_0)] = m$ regardless of the (unknown) value of $m$ , we MUST have $\sum_{i=1}^N w_i = 1$ . The OK predictor is therefore a WEIGHTED AVERAGE of the sample values — not a deviation from a hardcoded mean, as in SK. There is no separate constant offset $w_0$ ; the constraint $\Sigma w = 1$ absorbs that role.
Set up the CONSTRAINED OPTIMISATION: minimise $E[(Z^*(\mathbf{s}_0) - Z(\mathbf{s}_0))^2]$ subject to $\sum_i w_i = 1$ . Use a Lagrange multiplier $\mu$ for the constraint: form the Lagrangian $L(\mathbf{w}, \mu) = \text{MSE}(\mathbf{w}) - 2\mu(\sum_i w_i - 1)$ (the factor of 2 is the conventional choice that gives a clean system). Differentiate with respect to each $w_k$ and $\mu$ , set the gradients to zero, and assemble the result into a single linear system. The $\mathbf{w}$ -derivatives give $\sum_j w_j C(\mathbf{s}_i, \mathbf{s}_j) - \mu = C(\mathbf{s}_i, \mathbf{s}_0)$ for each $i$ ; the $\mu$ -derivative gives the constraint $\sum_i w_i = 1$ .
Write the ORDINARY KRIGING SYSTEM in augmented matrix form. With the same $\mathbf{K}, \mathbf{k}, \mathbf{w}$ as in §5.1, the OK system is $\begin{pmatrix} \mathbf{K} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} \mathbf{w} \\ \mu \end{pmatrix} = \begin{pmatrix} \mathbf{k} \\ 1 \end{pmatrix}$ where $\mathbf{1}$ is the $N \times 1$ column of ones. The augmented matrix is $(N+1)\times(N+1)$ . It is SYMMETRIC but NOT positive-definite (the zero at the bottom-right corner kills PD-ness), so Cholesky no longer works; the standard solver is LU DECOMPOSITION WITH PARTIAL PIVOTING. The matrix is still INVERTIBLE whenever the sample-to-sample covariance matrix $\mathbf{K}$ is SPD and the samples are distinct.
Compute the ORDINARY KRIGING VARIANCE $\sigma_{OK}^2(\mathbf{s}_0) = C(0) - \sum_i w_i \, C(\mathbf{s}_i, \mathbf{s}_0) + \mu = C(0) - \mathbf{w}^\top \mathbf{k} + \mu$ (using the Cressie-1993/Chilès-Delfiner sign convention where $\mu \ge 0$ ; some references absorb the sign into $\mu$ and write $-\mu$ ). Two interpretations of the $+\mu$ term. (a) ALGEBRAICALLY: it falls out of substituting the optimal $\mathbf{w}, \mu$ back into the Lagrangian, just as the SK variance fell out of substituting the SK weights back into the MSE. (b) STATISTICALLY: $\mu$ is the EXTRA UNCERTAINTY introduced by estimating $m$ on the fly. The constraint $\sum w = 1$ forces the kriging system to do that estimation implicitly, and $\mu$ is its price. $\mu \ge 0$ in the typical case, so $\sigma_{OK}^2 \ge \sigma_{SK}^2$ at any matched target — the OK variance honestly reflects the mean-estimation uncertainty the SK variance silently ignored.
Solve the OK system in practice. Real implementations build the augmented matrix once per neighbourhood (the $\mathbf{K}$ block depends only on samples and variogram, not on target), factorise by LU with partial pivoting, then forward/back-substitute on each target's $(\mathbf{k}, 1)^\top$ . Numerical robustness: add a tiny jitter $\epsilon \sim 10^{-9}$ to the diagonal of $\mathbf{K}$ to handle near-coincident samples without making the augmented matrix singular. The OK augmented matrix can still go singular if (a) two samples coincide exactly, (b) the constraint row is degenerate (e.g. you accidentally use $N=0$ samples), or (c) the variogram is non-permissible. All three are configuration errors, not OK-specific pathologies.
Walk through the WORKED EXAMPLE on the same five-sample 1-D layout used for SK in §5.1. Five samples on $[0,1]$ , Spherical variogram with $c_0 = 0, c = 1, a = 0.5$ , target $\mathbf{s}_0 = 0.5$ . The $5 \times 5$ $\mathbf{K}$ and $5 \times 1$ $\mathbf{k}$ are the same as in §5.1. Add the augmented row and column: now you have a $6 \times 6$ system. Solving by LU gives a slightly DIFFERENT weight vector — the unbiasedness constraint shifts each weight by a small amount to ensure they sum to 1. The Lagrange multiplier $\mu$ comes out small and positive (typically a few percent of the sill). $\sigma_{OK}^2$ exceeds $\sigma_{SK}^2$ by exactly $|\mu|$ . The estimate $Z^*_{OK}(0.5)$ is close to $Z^*_{SK}(0.5)$ — the practical impact at well-sampled targets is small.
Recognise OK as the PRACTICAL DEFAULT for geostatistics. Production codes — GSLIB's $\text{kt3d}$ in its OK mode, gstat::krige() in R, pykrige.OrdinaryKriging, Petrel and Leapfrog kriging — all DEFAULT to ordinary kriging. The reason is simple: in mineral exploration, petroleum reservoir characterisation, soil-property mapping, hydrology, environmental remediation, the analyst rarely has a defensible prior for $m$ that they can promote to a known constant. Estimating $m$ from the data and feeding it to SK is the WRONG move — it under-reports variance. Estimating $m$ implicitly via OK's $\sum w = 1$ constraint is the RIGHT move — the kriging variance $\sigma_{OK}^2$ honestly reflects the cost.
Catalogue the PROPERTIES of OK that match SK. (a) EXACT INTERPOLATOR at samples in the no-nugget case: with $c_0 = 0$ and $\mathbf{s}_0 = \mathbf{s}_i$ , the system gives $w_i = 1$ , $w_j = 0$ for $j \ne i$ , $\mu = 0$ , so $Z^*_{OK}(\mathbf{s}_i) = z_i$ and $\sigma_{OK}^2(\mathbf{s}_i) = 0$ . With a nugget the exactness breaks the same way it did for SK. (b) SMOOTHING: a kriged surface from OK is just as smooth as one from SK — the kriging variance reduction implies the predictor's variance is below $\sigma^2 = c_0 + c$ . (c) SCREEN EFFECT: OK weights can be NEGATIVE for the same reason as in SK — an intervening sample shields a more distant one. The constraint $\sum w = 1$ doesn't preclude individual weights from going negative; it just constrains their sum.
List the PROPERTIES of OK that DIFFER from SK. (a) Weights do NOT depend on the mean — OK doesn't use $m$ anywhere in the system. (b) $\sigma_{OK}^2 \ge \sigma_{SK}^2$ at any matched target (assuming the SK uses an unbiased estimator of $m$ ). The difference is $|\mu|$ . (c) OK is ROBUST to mean misspecification — the side-by-side widget shows this clearly: SK estimates drift toward the assumed $m$ in sparsely-sampled regions, while OK estimates stay anchored to nearby data. (d) OK works on LOCAL neighbourhoods even when the global mean varies slowly — the constraint $\sum w = 1$ adapts to local data clusters, so the implicit mean estimate adjusts neighbourhood by neighbourhood (§5.6 develops this).
Identify the STATIONARITY caveat. OK assumes $m$ is constant ACROSS THE NEIGHBOURHOOD, not the entire domain. With a small local neighbourhood (e.g. the 15-30 nearest samples), OK can handle fields whose mean varies slowly with location — the local-neighbourhood mean estimate adapts to where the target is. With a global neighbourhood and a real trend, OK will under-perform because the constraint $\sum w = 1$ forces a single mean estimate over the whole domain. The cleanest handling for explicit trends is UNIVERSAL KRIGING (§5.3), which adds basis-function constraint rows for the trend coefficients. OK with local neighbourhoods often does almost as well as UK in practice — see the comparison studies in Chilès & Delfiner 2012 §3.6 and Goovaerts 1997 §6.2.
Apply the HONEST CAVEATS catalogue. (a) The constraint $\sum w = 1$ does NOT prevent NEGATIVE weights — the screen effect persists. (b) OK can give POORLY-CONSTRAINED estimates in regions far from any sample — the constraint $\sum w = 1$ has to be honoured even when $\mathbf{k}$ is essentially zero, forcing the weights to distribute somehow over the samples; the resulting estimate is something like a simple average rather than a defensible local prediction. The kriging variance $\sigma_{OK}^2$ does correctly flag this as high-uncertainty, but the POINT ESTIMATE can be misleading on its own. (c) OK still requires the variogram to be permissible (§4.1) and well-fitted (§4.5); BAD variogram → bad OK kriging, just like bad variogram → bad SK kriging.
Use the WORKFLOW for a defensible ordinary-kriging report. (1) Confirm the variogram model from Parts 3-4 is correct. (2) Decide on a NEIGHBOURHOOD strategy: global if $N$ is modest (a few hundred), otherwise local with the $k$ -nearest samples (§5.6). (3) Build the augmented $(N+1)\times(N+1)$ matrix at each target, factorise by LU, forward/back-substitute on the $(\mathbf{k}, 1)^\top$ right-hand side. (4) Compute $Z^*_{OK}(\mathbf{s}_0) = \sum w_i z_i$ and $\sigma_{OK}^2(\mathbf{s}_0) = C(0) - \mathbf{w}^\top \mathbf{k} + \mu$ (with $\mu \ge 0$ in the standard convention). (5) Verify $\sum w \approx 1$ on the output as a sanity check. (6) Cross-validate (§4.5 leave-one-out) — the SD of standardised OK residuals should be near 1. (7) Report the estimate map and the variance map, plus the cross-validation diagnostics.

§5.1 derived the simple-kriging system from first principles under the strong assumption that the field mean $m = E[Z(\mathbf{s})]$ is KNOWN. That assumption is mathematically clean — it gives the cleanest derivation in the kriging family, with the cleanest expression for the predictor and the variance. It is also the assumption that real geostatistical workflows almost never satisfy. The global declustered mean from Parts 1-2 is an ESTIMATE of $m$ , not the truth. Regional reference values are noisy. Prior studies cover similar but not identical fields. Even when you write down a specific number and call it "known", you are smuggling estimation uncertainty into the system without propagating it.

The honest fix is to drop the known-mean assumption and let the kriging system handle the mean implicitly. That is ORDINARY KRIGING (OK). The price is one extra equation — an UNBIASEDNESS CONSTRAINT enforced via a Lagrange multiplier — and a slightly larger linear system. The reward is enormous: a kriging predictor that doesn't need a known mean, doesn't under-report variance, and adapts naturally to LOCAL neighbourhoods where the mean can vary slowly across the domain. Ordinary kriging is the PRACTICAL DEFAULT of geostatistics. The GSLIB $\text{kt3d}$ program defaults to OK. So do gstat in R, pykrige in Python, the kriging panes in Petrel, Leapfrog, Vulcan, Datamine. In 90%+ of real geostatistical work, the answer to "which kriging do I use?" is "ordinary kriging".

This section develops ordinary kriging from the same first principles as §5.1. We diagnose the simple-kriging weakness, state the OK assumption (mean constant but unknown), derive the unbiasedness constraint $\sum w = 1$ , set up the constrained optimisation via a Lagrange multiplier $\mu$ , write the AUGMENTED $(N+1)\times(N+1)$ kriging system, solve it via LU decomposition, and compute the OK variance. Two widgets bring it to life — a 1-D step-by-step that shows the augmented system and overlays the OK answer against the SK answer for the same data, and a 2-D side-by-side that compares SK (with a draggable assumed mean) against OK on the same field. The teaching arc is built around one observation: OK is what you actually do in practice; SK is the foundation that explains why OK works.

The problem with simple kriging

The simple-kriging predictor is $Z^*_{SK}(\mathbf{s}_0) = m + \sum_i w_i (Z(\mathbf{s}_i) - m)$ . It requires the value $m$ as input. In all but a handful of cases — regulatory reference concentrations, regional-baseline depositional values from prior studies, well-controlled experimental fields — we do not know $m$ . We estimate it. The most defensible estimator is the §1.4-§1.5 declustered global mean, denoted $\hat m$ . Plugging $\hat m$ into the SK formula gives a predictor that LOOKS the same:

Z^*_{SK}(\mathbf{s}_0) \;=\; \hat m + \sum_i w_i (Z(\mathbf{s}_i) - \hat m).

But $\hat m$ is a random variable with its own sampling distribution. Its variance is approximately $\text{Var}(\hat m) \approx \sigma^2 / N_{\text{eff}}$ where $N_{\text{eff}}$ is an effective sample size accounting for spatial correlation between the samples used to compute $\hat m$ . The simple-kriging variance $\sigma_K^2(\mathbf{s}_0) = C(0) - \mathbf{w}^\top \mathbf{k}$ DOES NOT include any term for $\text{Var}(\hat m)$ . It treats $m$ as a known constant.

The consequence: SK with an estimated mean systematically UNDER-REPORTS the prediction uncertainty. The under-estimate is small in well-sampled regions (where $\mathbf{w}^\top \mathbf{k}$ is large and the $\hat m$ contribution is dominated by data) but grows in sparsely-sampled regions (where $\mathbf{w}^\top \mathbf{k}$ shrinks, the predictor falls back to $\hat m$ , and the unaccounted $\text{Var}(\hat m)$ becomes a meaningful fraction of the residual variance). For risk decisions sensitive to tail probabilities — environmental remediation, ore-grade cutoffs, regulatory compliance — this miscalibration matters.

There is a second, subtler weakness. Even when $\hat m$ is correct ON AVERAGE, it is a GLOBAL estimate. Real fields often have slowly-varying means — a depositional facies with an underlying compositional gradient, a porosity trend tied to depth, a contamination plume with a non-stationary background. SK with a global $\hat m$ silently averages over these local variations. The predictor pulls toward $\hat m$ in sparsely-sampled regions even when local data would argue for a different anchor.

Ordinary kriging fixes both weaknesses by changing the unbiasedness constraint, not the variogram or the predictor form. The change is small mathematically and large practically.

The ordinary-kriging assumption — mean constant but unknown

ORDINARY KRIGING assumes:

E[Z(\mathbf{s})] \;=\; m, \qquad \text{constant in } \mathbf{s}, \text{ value UNKNOWN}.

Compare to simple kriging's assumption, which has the same constant-mean form but ADDITIONALLY requires the value to be known. The OK assumption is STRICTLY WEAKER: every situation where SK's assumption holds, OK's assumption also holds. Conversely, every situation where you can't reliably commit to a numerical value of $m$ but you do have stationarity, OK applies and SK doesn't.

In practice, OK is almost always combined with a LOCAL NEIGHBOURHOOD: the $k$ -nearest samples to each target, with $k$ typically in the 15-30 range. Within such a neighbourhood, the constancy of $m$ is a much milder requirement than across the entire domain. A field with a slow global trend can be approximated as locally stationary within each neighbourhood. The kriging system's implicit mean estimate adjusts neighbourhood by neighbourhood, automatically. §5.6 develops the neighbourhood-selection apparatus in full; for §5.2, just note that "stationary mean" in OK is properly LOCAL stationarity, not the rigid GLOBAL stationarity that SK would require if you tried to use a single value of $m$ .

The unbiasedness constraint — Σ w = 1

Without the offset $w_0$ , the OK predictor is

Z^*_{OK}(\mathbf{s}_0) \;=\; \sum_{i=1}^N w_i \, Z(\mathbf{s}_i).

Take the expectation, using $E[Z(\mathbf{s}_i)] = m$ for every $i$ :

E[Z^*_{OK}(\mathbf{s}_0)] \;=\; \sum_i w_i \, E[Z(\mathbf{s}_i)] \;=\; m \sum_i w_i.

For the predictor to be UNBIASED — $E[Z^*_{OK}(\mathbf{s}_0)] = m = E[Z(\mathbf{s}_0)]$ — REGARDLESS OF the value of $m$ , we MUST have

\boxed{\;\sum_{i=1}^N w_i \;=\; 1.\;}

This is the OK unbiasedness constraint, the SIGNATURE of ordinary kriging. It says the kriging predictor is a TRUE WEIGHTED AVERAGE of the sample values — the weights distribute the "share" each sample contributes to the prediction, and the shares total 1. No hardcoded mean appears anywhere; the constraint absorbs the role of $w_0 = (1 - \sum w_i) m$ from SK's formulation.

A few consequences worth noting. (a) The constraint is INDEPENDENT of the variogram — it holds whatever covariance you choose, even a non-permissible one (though the resulting kriging would be invalid). (b) It does not preclude individual weights from being NEGATIVE — the screen effect persists. (c) It generalises gracefully to universal kriging: §5.3 will add one constraint row per basis function in the assumed trend; $\sum w = 1$ corresponds to a "trend = constant" basis function.

The constrained optimisation — Lagrange multiplier μ

We now minimise the MSE $E[(Z^*_{OK}(\mathbf{s}_0) - Z(\mathbf{s}_0))^2]$ subject to $\sum_i w_i = 1$ . The standard tool for constrained optimisation is the LAGRANGE MULTIPLIER. Introduce a scalar $\mu$ and form the Lagrangian

L(\mathbf{w}, \mu) \;=\; \text{MSE}(\mathbf{w}) \;+\; 2\mu\Bigl(\sum_i w_i - 1\Bigr).

The factor of 2 in front of $\mu$ is a conventional choice that keeps the resulting system clean (it cancels the 2 from differentiating the squared loss). The $+$ sign matches the Cressie 1993 / Chilès-Delfiner 2012 convention and yields $\mu \ge 0$ in the typical case (other references use $-$ ; the physical content is identical). The optimum satisfies

$\dfrac{\partial L}{\partial w_k} = 0$ for each $k = 1, \ldots, N$ .
$\dfrac{\partial L}{\partial \mu} = 0$ , which is just the constraint $\sum_i w_i = 1$ .

Differentiating MSE w.r.t. $w_k$ — same calculation as in §5.1 — gives $2 \sum_j w_j C(\mathbf{s}_k, \mathbf{s}_j) - 2 C(\mathbf{s}_k, \mathbf{s}_0)$ . The $+2\mu$ term from the Lagrangian contributes $+2\mu \cdot 1 = +2\mu$ to each $\partial L / \partial w_k$ . Setting the sum to zero and dividing by 2:

\sum_{j=1}^N w_j \, C(\mathbf{s}_k, \mathbf{s}_j) \;+\; \mu \;=\; C(\mathbf{s}_k, \mathbf{s}_0), \quad k = 1, \ldots, N.

Combined with the constraint, that's $N+1$ linear equations in $N+1$ unknowns (the $N$ weights plus $\mu$ ). Rewritten as $\sum_j w_j C(\mathbf{s}_k, \mathbf{s}_j) = C(\mathbf{s}_k, \mathbf{s}_0) - \mu$ , the constraint can be folded into a single block-matrix system in the next step.

The ordinary kriging system — augmented (N+1)×(N+1)

The full system writes cleanly in BLOCK MATRIX form. With $\mathbf{K}$ the $N \times N$ sample-to-sample covariance matrix (same as in §5.1), $\mathbf{k}$ the $N \times 1$ sample-to-target covariance vector, $\mathbf{w}$ the $N \times 1$ vector of weights, $\mathbf{1}$ the $N \times 1$ column of ones, and $\mu$ the Lagrange multiplier:

\boxed{\;\begin{pmatrix} \mathbf{K} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} \mathbf{w} \\ \mu \end{pmatrix} \;=\; \begin{pmatrix} \mathbf{k} \\ 1 \end{pmatrix}.\;}

The augmented matrix is $(N+1)\times(N+1)$ . Its structure is worth dissecting. The upper-left $N \times N$ block is exactly the SK matrix — the same covariance values, the same symmetry, the same positive-definiteness. The augmented row and column carry the constraint. The bottom-right entry is ZERO, which kills positive-definiteness of the whole matrix: Cholesky decomposition is no longer the right tool because Cholesky requires PSD. The standard solver is LU DECOMPOSITION WITH PARTIAL PIVOTING, which works on any non-singular square matrix. Production codes use this.

The right-hand side is the SK right-hand side augmented with a 1 — that 1 carries the $\sum w = 1$ constraint into the system. Solving gives you $\mathbf{w}$ AND $\mu$ simultaneously, in a single LU call.

The system is INVERTIBLE whenever (a) $\mathbf{K}$ is symmetric positive-definite (the permissibility from §4.1 plus distinct sample locations) and (b) $N \ge 1$ (you can't krige with zero samples). Both conditions are routinely satisfied in production. Numerical robustness against near-coincident samples: add $\epsilon \sim 10^{-9}$ jitter to the diagonal of the $\mathbf{K}$ block before forming the augmented matrix.

Once solved, the OK weights are used directly in the predictor:

Z^*_{OK}(\mathbf{s}_0) \;=\; \sum_{i=1}^N w_i \, Z(\mathbf{s}_i),

with the property $\sum w_i = 1$ verifiable on the output as a sanity check. Notice no $m$ anywhere — the OK predictor needs only the data values and the weights.

The ordinary kriging variance

Plugging the optimal $\mathbf{w}, \mu$ back into the Lagrangian and using the OK system equations to simplify gives the ORDINARY-KRIGING VARIANCE:

\boxed{\;\sigma_{OK}^2(\mathbf{s}_0) \;=\; C(0) \,-\, \sum_{i=1}^N w_i \, C(\mathbf{s}_i, \mathbf{s}_0) \,+\, \mu \;=\; C(0) \,-\, \mathbf{w}^\top \mathbf{k} \,+\, \mu.\;}

(Sign convention: this presentation uses $\mu \ge 0$ with a $+\mu$ in the variance formula, matching Cressie 1993 eq. 3.2.6 in covariance form and Chilès & Delfiner 2012 §3.4. Some references — Goovaerts 1997 eq. 5.20 included — absorb the sign into $\mu$ and write $-\mu$ with their $\mu \le 0$ ; the physical content is identical and what matters is $\sigma_{OK}^2 \ge \sigma_{SK}^2$ .) Compare to the SK variance $\sigma_K^2 = C(0) - \mathbf{w}^\top \mathbf{k}$ (using SK-optimal weights). The OK variance is LARGER by approximately the magnitude of $\mu$ . The Lagrange multiplier $\mu$ has units of variance and physically represents the COST of estimating the mean on the fly. When the data are dense and the constraint $\sum w = 1$ is easy to satisfy with locally-concentrated weights, $\mu$ is small. When the data are sparse and the constraint forces the weights to spread out (any one sample contributing only a little), $\mu$ grows.

Three things to notice about the OK variance.

Non-negative. $\sigma_{OK}^2 \ge 0$ for any permissible variogram. The proof rides on the same PSD argument as for SK, with the augmented constraint folded in. Production-quality implementations clip any tiny-negative variance (rare, due to floating-point precision near the zero floor) to zero.
Bounded by the sill from above. When the target is far from every sample, $\mathbf{k} \to \mathbf{0}$ . The constraint $\sum w = 1$ forces the weights to distribute over the samples (since the OK system can't put zero on all weights), but their $\mathbf{w}^\top \mathbf{k}$ inner product still tends to zero. The variance approaches $C(0) - \mu_{\infty}$ where $\mu_{\infty}$ is the limiting Lagrange multiplier for a far-away target — in practice, this approaches the sill, but the constraint cost is real.
Strictly greater than σ²_SK at matched targets. Compare OK at a target (with OK-optimal weights) against SK at the same target (with SK-optimal weights, using an unbiased estimator of $m$ ). The OK variance exceeds the SK variance by approximately $\mu \ge 0$ . The Lagrange multiplier is the HONEST cost of estimating the mean. SK with an estimated mean would under-report its variance by exactly that amount.

A worked five-sample example by hand

Use the same setup as §5.1's worked example. Five samples on $[0, 1]$ :

\begin{array}{c|ccccc} i & 1 & 2 & 3 & 4 & 5 \\ \hline x_i & 0.10 & 0.25 & 0.45 & 0.70 & 0.90 \\ z_i & 1.0 & 2.0 & 3.5 & 2.5 & 1.5 \end{array}

Variogram: Spherical $c_0 = 0, c = 1, a = 0.5$ . Target: $\mathbf{s}_0 = 0.55$ . The $5\times 5$ $\mathbf{K}$ and $5\times 1$ $\mathbf{k}$ are exactly as computed in §5.1.

Step 1 — AUGMENT. Append a column of ones and a row of ones to $\mathbf{K}$ , with a zero in the bottom-right corner. Append a 1 to $\mathbf{k}$ . You now have a $6 \times 6$ matrix and a $6 \times 1$ vector.

Step 2 — SOLVE by LU with partial pivoting. The pivot rows track which rows get swapped during elimination; the forward-substitution applies the same permutation to $(\mathbf{k}, 1)^\top$ . The result is the solution $(\mathbf{w}, \mu)^\top$ .

Step 3 — VERIFY. Compute $\sum w_i$ from the solved weights. It should be exactly 1 (up to floating-point precision). If it isn't, the solve failed and you have a configuration problem (likely a near-singular matrix from coincident samples).

For our example with target $\mathbf{s}$ , the OK solve produces weights very close to the SK weights from §5.1. The SK weights were $\mathbf{w}$ {SK} \approx (-0.04, -0.03, 0.60, 0.40, -0.06) $w_{S K} \approx (- 0.04, - 0.03, 0.60, 0.40, - 0.06)$ summing to about 0.87. The OK weights, adjusted to sum to 1, come out $\mathbf{w}$ — the three innermost samples take roughly the same share they did under SK, the outer samples shift slightly toward positive. The Lagrange multiplier $\mu \approx 0.045$ (a few percent of the sill $c_0 + c = 1$ ). The OK estimate is $Z^*$ {OK}(0.55) = \sum w_i z_i \approx 3.17 $Z_{O K}^{*} (0.55) = \sum w_{i} z_{i} \approx 3.17$ (close to the SK estimate $\approx 3.18$ ). The OK variance is $\sigma_{OK}^2 \approx 1.0 - 0.673 + 0.045 \approx 0.37$ — slightly larger than $\sigma_K^2 \approx 0.37$ at three decimal places, with the difference reflecting the Lagrange-multiplier penalty.

The pattern generalises. Wherever the SK weights nearly summed to 1, the OK adjustment is small. Wherever the SK weights summed to something far from 1 (rare with clean data and a reasonable variogram), the OK adjustment is larger and the $\mu$ correction matters more.

The first widget for §5.2 sits next to the §5.1 simple-kriging-step widget in spirit. Same six samples on $[0, 1]$ , same Spherical variogram with adjustable $c_0, c, a$ . The reader drags the query point $\mathbf{s}_0$ along the x-axis; the widget recomputes the OK augmented system at each position and shows everything.

The TOP panel plots the samples plus the OK KRIGED CURVE $Z^*$ in solid green (with the ±1 $\sigma$ {OK} $σ_{O K}$ shaded envelope), PLUS the SK curve from §5.1 in dashed grey using the empirical sample mean as the assumed $m$ . Watch the two curves: where data are dense, OK and SK trace the same curve. Where data thin out, the SK curve falls back toward the dashed-grey horizontal at the sample mean, while the OK curve stays anchored to nearby data. The MIDDLE panel shows the six OK weights ${w_i}$ as green/red bars (positive in green, negative in red) with the corresponding SK weights overlaid as small grey tick marks. The Σ $w_i$ value is annotated at the top — under OK it must equal 1, and the widget checks this on screen. The BOTTOM panel reports the numerical solve in two rows: top row shows the OK output (estimate, variance, μ, Σw), the bottom row shows the SK output for comparison plus the variance difference $\sigma^2_{OK} - \sigma^2_{SK} = |\mu|$ .

Three things to do with this widget. FIRST, drag the query to $x = 0.50$ (between samples 3 and 4). Watch the weights: samples 3 and 4 dominate under OK just as they did under SK in §5.1. The Σw pill above the bar chart reads 1.000. The Lagrange multiplier $\mu$ is small. The OK and SK estimates are nearly identical; the OK variance is slightly larger. The grey ticks on each bar essentially overlap the OK bars — when data are dense and the variogram is well-fit, OK and SK behave the same. SECOND, drag the query out to $x = 0.97$ (far right). The SK dashed curve drops toward the empirical-mean dashed-grey line. The OK solid curve stays closer to sample 6's value. The Σw still reads 1.000. The Lagrange multiplier $\mu$ has grown — that's the cost of estimating m at a target where the data have weak leverage. The variance difference $\sigma^2_{OK} - \sigma^2_{SK}$ in the bottom-row pill is now visibly positive. THIRD, set the nugget $c_0$ to 0.30. The kriged curves both pull away from exact interpolation at samples (both OK and SK lose exactness with nugget). The OK and SK weights diverge slightly more under heavier nugget — the constraint is harder to satisfy locally — and the Lagrange multiplier $\mu$ trends up.

When does the SK-vs-OK difference matter?

Most introductions to kriging emphasise that "OK gives slightly different weights than SK". That's true and uninteresting. The interesting question is: WHEN do OK and SK give substantially different ANSWERS, and which is right?

The honest framing comes from the SK-assumption being violated. SK requires you to commit to a specific value of $m$ . If you commit to $m_{\text{assumed}} \approx m_{\text{true}}$ , SK and OK give very similar answers — slightly different weights, slightly different variances. If you commit to $m_{\text{assumed}} \ne m_{\text{true}}$ , SK produces a SYSTEMATICALLY BIASED predictor; OK doesn't.

The bias manifests where the data have weak leverage. In well-sampled regions, the kriging weights overwhelmingly favour nearby data and the predictor is data-dominated for both SK and OK. In sparsely-sampled regions, the SK predictor falls back to $m_{\text{assumed}}$ as $\mathbf{w}^\top \mathbf{k}$ shrinks — if $m_{\text{assumed}}$ is wrong, the SK map is wrong wherever data are sparse. The OK predictor handles the same situation by spreading the weights over the available samples (subject to $\sum w = 1$ ), producing an estimate that's some kind of weighted average of the nearby samples rather than a pull toward a hardcoded value.

The second widget makes this concrete on a 2-D field.

The reader picks a sample layout (clustered, uniform, random), tunes the Spherical variogram, and — crucially — slides the ASSUMED MEAN $m_{\text{assumed}}$ that feeds into SK. The widget renders four heatmaps on the unit square: SK estimate (top-left), OK estimate (top-right), SK variance (bottom-left), OK variance (bottom-right). A row of pills underneath reports the sample mean, the assumed mean used by SK, their absolute difference, the maximum estimate-map difference $\max |Z^$ over the grid, and the range of the variance-map difference $\sigma^2$ {OK} - \sigma^2_{SK} $σ_{O K}^{2} - σ_{S K}^{2}$ (which is always non-negative).

The "Set m to sample mean" button snaps $m_{\text{assumed}}$ onto the empirical mean of the samples — the calibrated SK input that an analyst would use if they followed the §1.4-§1.5 declustering workflow. Click it and watch what happens to the maps.

Three things to do with this widget. FIRST, click "Set m to sample mean" to put SK in its calibrated regime. Note that the SK and OK estimate maps look nearly IDENTICAL — small differences only, concentrated in the few cells far from any sample. The SK and OK variance maps are very similar, with the OK map slightly hotter (red shifted toward yellow) by a uniform-ish offset. That offset IS the Lagrange-multiplier penalty $|\mu|$ — paid uniformly across the field as the honest cost of estimating m. Read the "max |Z*_OK − Z*_SK|" pill: it should be small (a few percent of the sill).

SECOND, slide $m_{\text{assumed}}$ AWAY from the sample mean. Push it up to 1.2 or down to 0.0. The SK estimate map starts to shift visibly — corners of the unit square far from samples take on the colour of $m_{\text{assumed}}$ . The OK map BARELY MOVES; it stays anchored to the actual data values. The "max |Z*_OK − Z*_SK|" pill grows. The variance difference range stays roughly constant — that is the Lagrange-multiplier penalty, which doesn't depend on the assumed mean. The teaching point: OK is ROBUST to mean misspecification; SK is NOT.

THIRD, pick the CLUSTERED layout (10 samples in a tight cluster plus 3 outliers). Set $m_{\text{assumed}}$ to 0.2 (low). The SK estimate map shows blue patches (low values) in regions far from the cluster — that's the predictor falling back to $m_{\text{assumed}}$ where there's no data leverage. The OK map shows estimates closer to the local sample values. Now slide $m_{\text{assumed}}$ up to 1.0 (high). The SK map shifts toward yellow in the far regions; the OK map barely changes. Read the "|m_assumed − sample mean|" pill: at extremes it's 0.3-0.5; the "max |Z*_OK − Z*_SK|" pill is correspondingly large. This is the pathology that pushes practitioners toward OK as the default — without OK you can produce wildly different maps just by choosing different "known" means.

OK is the practical default

The decision tree in a real geostatistical workflow looks like this:

Do you have a defensible, externally validated value for $m$ ? (Regulatory reference, regional baseline from a trusted prior study, a well-controlled experimental field.) → Use SIMPLE KRIGING. The known-mean assumption is satisfied, the SK system is the cleanest, and you get $\sigma_K^2$ without the Lagrange-multiplier penalty.
Do you have data with a stationary mean across the kriging neighbourhood, but no externally validated value? → Use ORDINARY KRIGING. The unbiasedness constraint absorbs the mean estimation, and $\sigma_{OK}^2$ honestly reflects the cost. This is the typical case.
Does the field have an obvious trend (depth-dependent porosity, contamination plume gradient, deposition-axis compositional shift)? → Use UNIVERSAL KRIGING or KED (§5.3). These extend OK by modelling the trend as a basis-function expansion with unknown coefficients; the constraint rows generalise from one ( $\sum w = 1$ ) to one per basis function.

The dominant case in production is the middle one. Mineral exploration, petroleum reservoir characterisation, soil-property mapping, hydrology, environmental remediation, snow-pack water-equivalent estimation, atmospheric pollutant concentration — all are usually addressed with OK first. Universal kriging is brought in only when a specific trend is identified by the geological/physical interpretation, not as a default.

The implementations match. GSLIB's $\text{kt3d}$ defaults to OK (kriging type $itype = 1$ ; SK is $itype = 0$ ; UK and KED are $itype = 2,3,4,5$ ). gstat in R has a $\text{krige(formula = z\textasciitilde 1, \ldots)}$ where the right-hand side $1$ means "constant unknown mean" — pure OK. pykrige.OrdinaryKriging is the named class for OK. Petrel's Make Property dialog defaults to OK. None of these tools defaults to SK because none assumes the user has a defensible prior on $m$ .

Stationarity assumption — local, not global

OK's "mean constant" assumption is local in practice. Production OK uses a LOCAL SEARCH NEIGHBOURHOOD: the $k$ -nearest samples to each target (typically $k = 15-30$ ), often constrained to an ANISOTROPIC SEARCH ELLIPSOID aligned with the variogram's correlation lengths. Within this neighbourhood, the constancy of $m$ is a much milder assumption than across the entire field.

Why does this work? The OK constraint $\sum w = 1$ within the local neighbourhood is essentially fitting a LOCAL CONSTANT MEAN ESTIMATE from the $k$ nearby samples — an implicit local-mean estimator that adapts to wherever the target sits. As the target moves across the domain, the local neighbourhood changes; the implicit mean estimate changes with it. So OK behaves as a SPATIALLY ADAPTIVE estimator even though its formal assumption is constancy of $m$ .

The catch: a slowly-varying field that satisfies local constancy at the $k = 20$ -sample scale will be miscalibrated if you use a global neighbourhood (all $N$ samples). The variance $\sigma_{OK}^2$ stays calibrated; the estimate $Z^*_{OK}$ is biased toward the global mean wherever the local mean differs. This is one of the most common reasons OK kriged maps look "too flat" or "missed the trend" in production — the kriging used too large a neighbourhood. §5.6 develops the neighbourhood-selection apparatus in detail with the explicit cost-quality tradeoffs.

For fields with a clear, identifiable TREND (linear or quadratic in coordinates, or driven by a secondary variable like depth or distance from a feature), the cleanest handling is to model the trend explicitly. That is UNIVERSAL KRIGING (§5.3, when the trend is in coordinates) or KED (§5.3, when the trend is driven by an external variable). OK is the right tool when you have local stationarity or a slowly-varying trend that's well-approximated by local mean estimates; UK/KED is the right tool when you have a clear, modellable global trend.

Honest caveats — what ordinary kriging does and does not do

The screen effect persists. The $\sum w = 1$ constraint does NOT preclude individual weights from being negative. An intervening sample still shields a more distant one in the same direction — the OK weights for the screened sample come out negative just as they did in SK. Negative weights remain a STRUCTURAL feature of the kriging system, not a bug.

The kriging variance still under-reports unless the variogram is right. $\sigma_{OK}^2$ is the variance UNDER the assumed variogram and stationarity. Wrong variogram → wrong kriging, just as in SK. The §4.5 cross-validation diagnostic (SD of standardised OK residuals near 1) is the empirical test that the variogram + OK setup is calibrated for the data being deployed. §6.3 develops the full calibration apparatus.

Far-from-data estimates can be misleading. The constraint $\sum w = 1$ has to be honoured even when $\mathbf{k} \to \mathbf{0}$ . The weights distribute over the samples and the OK estimate becomes essentially an unweighted (or near-unweighted) average — which is not a defensible local prediction. The variance $\sigma_{OK}^2$ does correctly flag this as high-uncertainty (it approaches the sill, with a Lagrange-multiplier add-on), but the point estimate looks more confident than it is. Practical workaround: clip the kriging neighbourhood, so that targets far from any sample produce a "no estimate" flag rather than a poorly-constrained one. §5.6 develops the search-ellipsoid machinery that supports this.

OK does not magically handle non-stationarity. If the field has a clear trend, OK with a global neighbourhood will systematically bias toward the global mean. OK with a local neighbourhood handles slow trends but not sharp ones. The right tool for explicit trends is UNIVERSAL KRIGING (§5.3); OK is the workhorse when the field is approximately stationary at the neighbourhood scale.

Coincident-sample singularity. Just as in SK, if two samples have the same location, the $\mathbf{K}$ block of the augmented matrix has two identical rows; the matrix is singular and LU fails. Combine duplicates, raise the nugget, or use jitter regularisation — the same defences as in SK, applied to the $\mathbf{K}$ block.

§5.2 in the architecture of Part 5

§5.2 is the SECOND FOUNDATION of Part 5. §5.1 gave the foundation with the simplest assumption; §5.2 dropped that assumption to give the practical default. The next sections continue the generalisation. §5.3 — UNIVERSAL KRIGING and KED — extends OK by allowing the mean to be a basis-function expansion. The $\sum w = 1$ constraint generalises to one constraint per basis function in the assumed trend: $\sum w_i f_p(\mathbf{s}_i) = f_p(\mathbf{s}_0)$ for each basis function $f_p$ . The system grows from $(N+1)\times(N+1)$ to $(N+P)\times(N+P)$ where $P$ is the number of basis functions. The Lagrange multipliers $\mu_p$ multiply, and the kriging variance gains one extra $-\mu_p$ term per multiplier.

§5.4 cashes out what the kriging variance means — and what it doesn't. The OK variance $\sigma_{OK}^2$ is a function of layout + variogram + $\mu$ ; its calibration to actual error depends on the variogram being right (Part 6). §5.5 generalises the target from a point to a BLOCK — the right-hand side $\mathbf{k}$ becomes point-to-block average covariances. §5.6 develops the LOCAL NEIGHBOURHOOD apparatus that production OK relies on. §5.7 — COKRIGING — brings in secondary variables, building a block-structured system from cross-covariances. §5.8 catalogues the modal kriging pathologies and how to diagnose them.

The takeaway from §5.2: OK is the kriging you actually run. The unbiasedness constraint is the single mathematical insight that makes it work. The Lagrange multiplier $\mu$ is the honest accounting of what it costs you. §5.1 is how you understand the system; §5.2 is what you deploy.

Try it

In ordinary-kriging-step, drag the query to $x = 0.50$ . Read Σw_i in the middle panel — it should be 1.000 exactly. Read the Lagrange multiplier μ from the bottom-row pill — it should be small and positive (a few percent of the sill). The variance difference σ²_OK − σ²_SK pill is close to but not exactly μ (each variance is computed with its own optimal weights — the relationship σ²_OK − σ²_SK = μ holds exactly only when σ²_SK is recomputed with OK weights).
In ordinary-kriging-step, drag the query to $x = 0.97$ (far right). The SK dashed curve drops toward the empirical-mean reference line (dotted grey). The OK solid curve stays closer to sample 6's value. Read the variance difference: it has grown. This is the Lagrange-multiplier penalty growing as the data lose leverage on the target.
In ordinary-kriging-step, find a query location with at least one NEGATIVE weight. Try $x = 0.30$ . Confirm that the screen effect persists in OK — negative weights are not eliminated by the constraint $\sum w = 1$ .
In ordinary-kriging-step, raise the nugget $c_0$ to 0.30. Both the OK and SK curves no longer pass through samples exactly — both lose exactness with nugget. The Σw still reads 1.000 under OK. The Lagrange multiplier μ trends up because each sample now contributes less to the prediction.
In sk-vs-ok-comparison, click "Set m to sample mean". The four heatmaps reach their CALIBRATED-SK regime — SK and OK estimate maps nearly identical, OK variance slightly hotter than SK by the |μ| offset. Read "max |Z*_OK − Z*_SK|" — should be small. This is the regime where SK and OK agree.
In sk-vs-ok-comparison, slide m_assumed up to 1.2 (well above sample mean). Watch the SK estimate map shift toward yellow in regions far from samples, while the OK map barely changes. Read "max |Z*_OK − Z*_SK|" — it has grown substantially. This is the regime where SK breaks: the assumed mean is wrong, and SK silently follows the assumption away from the data.
In sk-vs-ok-comparison, pick the CLUSTERED layout and slide m_assumed from 0.0 to 1.2. The SK estimate map shifts dramatically in regions outside the cluster; the OK estimate map shifts only slightly. The variance maps differ by the Lagrange-multiplier offset, which stays roughly constant across the slide. This is why OK is the default: maps are insensitive to an input you can't pin down.
In sk-vs-ok-comparison, set the variogram range $a$ to 0.10 (short). With short range, even OK has limited data leverage in the gaps. Compare the OK estimate map at short range vs long range (a = 0.80). Long range → smoother OK estimates with better gap coverage; short range → OK estimates jagged with each sample dominating its own cell.
Without coding: a soil-property mapping team has 80 samples from a single field, no external prior for the mean, no obvious large-scale trend. They want to predict moisture content on a 10-m grid. Which kriging variant is the right default, and why? What does the unbiasedness constraint enforce in their case?
Without coding: the same team runs leave-one-out cross-validation on their OK output. SD of standardised residuals is 0.85. What does this diagnose, and how would it affect the variance maps reported to the client? Is OK with this variogram over- or under-confident?

Pause and reflect: the Lagrange multiplier $\mu$ in the OK system has dimensions of variance and equals exactly the AMOUNT by which $\sigma_{OK}^2$ exceeds $\sigma_{SK}^2$ at the same target (when the SK uses an unbiased estimator of $m$ ). It can be read as the HONEST PRICE of estimating the mean on the fly. Why is this price LARGER far from samples (where $\mathbf{k} \to \mathbf{0}$ ) and SMALLER near sample clusters? What does that tell you about when SK and OK will agree closely vs differ substantially?

What you now know — and the open of §5.3

You can DIAGNOSE the weakness of simple kriging — the known-mean assumption that almost never holds in practice, and the resulting silent under-reporting of variance when $m$ is estimated from the data.

You can STATE the ORDINARY KRIGING assumption (mean CONSTANT but UNKNOWN within the kriging neighbourhood) and DERIVE the unbiasedness constraint $\sum w = 1$ from the requirement that $E[Z^*_{OK}(\mathbf{s}_0)] = m$ regardless of the unknown value of $m$ .

You can SET UP the constrained optimisation as a Lagrangian $L(\mathbf{w}, \mu) = \text{MSE}(\mathbf{w}) + 2\mu(\sum w_i - 1)$ and DERIVE the AUGMENTED $(N+1)\times(N+1)$ ordinary-kriging system

\begin{pmatrix} \mathbf{K} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} \mathbf{w} \\ \mu \end{pmatrix} = \begin{pmatrix} \mathbf{k} \\ 1 \end{pmatrix}

. You can SOLVE this system by LU decomposition with partial pivoting (Cholesky no longer works because of the zero on the augmented diagonal) and you can COMPUTE the ordinary-kriging variance $\sigma_{OK}^2 = C(0) - \mathbf{w}^\top \mathbf{k} + \mu$ in the $\mu \ge 0$ convention.

You can INTERPRET the Lagrange multiplier $\mu$ as the HONEST COST of estimating $m$ on the fly — it is exactly the amount by which $\sigma_{OK}^2$ exceeds $\sigma_K^2$ at the same target. You can WALK THROUGH a five-sample worked example by hand and confirm OK and SK give similar but distinct weights, with $\sum w_{OK} = 1$ exactly.

You can RECOGNISE the four PROPERTIES OK shares with SK (exact interpolation at zero nugget, smoothing, screen effect with potentially negative weights, coincident-sample handling) and the four PROPERTIES OK DIFFERS in (no $m$ in the system; $\sigma_{OK}^2 \ge \sigma_K^2$ ; ROBUST to mean misspecification; LOCAL-stationarity assumption with implicit local-mean estimation).

You can read the WIDGET output — the OK kriged curve compared to the SK curve, the OK weights with Σw = 1 verified on screen, the heatmap comparison showing OK is robust to the assumed-mean slider while SK isn't. You can IDENTIFY OK as the PRACTICAL DEFAULT of geostatistics and recognise it as the choice made by every production code (kt3d, gstat, pykrige, Petrel, Leapfrog).

You know the HONEST CAVEATS — the screen effect persists; far-from-data estimates are poorly constrained (the variance flags this but the point estimate looks confident); a clear trend needs UK/KED, not OK with a global neighbourhood; coincident-sample handling is the same as in SK.

This OPENS §5.3 — UNIVERSAL KRIGING and KED. §5.3 generalises the OK unbiasedness constraint from a single equation $\sum w = 1$ to one equation per BASIS FUNCTION in an assumed trend. The mean becomes $m(\mathbf{s}) = \sum_p \beta_p f_p(\mathbf{s})$ with unknown coefficients $\beta_p$ ; the constraint rows become $\sum_i w_i f_p(\mathbf{s}_i) = f_p(\mathbf{s}_0)$ for each $p$ . The augmented kriging matrix grows from $(N+1)\times(N+1)$ to $(N+P)\times(N+P)$ . The Lagrange multipliers multiply correspondingly. Universal kriging is the right tool when the field has a clear, modellable trend; ordinary kriging is the right tool when the field is approximately stationary at the neighbourhood scale. §5.3 develops the universal-kriging system the same way §5.2 developed OK — from the unbiasedness constraint outward.

References

Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8), 1246–1266. (The foundational paper for the kriging family. Defines the regionalised-variable framework, the variogram, and the BLUE criterion. The unbiasedness constraint and the Lagrange-multiplier derivation are sketched here; Matheron 1971 develops them in full mathematical rigour. The original cited reference for everything in §5.1 and §5.2.)
Matheron, G. (1971). The Theory of Regionalized Variables and Its Applications. Cahiers du Centre de Morphologie Mathématique, École des Mines de Paris, No. 5. (The expanded mathematical treatment. §4 develops simple, ordinary, and universal kriging from the constrained-optimisation framework with the Lagrange-multiplier apparatus made explicit. The reference for the mathematical underpinnings of the augmented kriging system.)
Krige, D.G. (1951). A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Chemical, Metallurgical and Mining Society of South Africa, 52(6), 119–139. (The original empirical paper that anticipated the optimal-weighting approach to ore-grade estimation. Krige observed the conditional-bias phenomenon — small samples in high-grade areas systematically over-estimated grade — and proposed a corrective regression. Matheron 1963 formalised the underlying mathematics and named the technique after Krige.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (§3 covers the kriging family at mathematical-statistics rigour. §3.4.5 specifically treats ordinary kriging, deriving the augmented system from the constrained-MSE-minimisation framework. The reference textbook for the formal derivation of the OK system and its asymptotic properties.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (Chapter 3 is the comprehensive modern reference for the kriging family. §3.4 specifically develops ordinary kriging with detailed treatment of the Lagrange multiplier, the geometry of the screen effect under the constraint, and the local-neighbourhood implementation. The standard graduate-school reference.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§5.4 develops ordinary kriging from the practitioner perspective with worked examples on the WL dataset. §5.5 catalogues the properties — exact interpolation, smoothing, BLUE under the unbiasedness constraint. The practitioner-textbook reference for §5.2 at graduate-school level.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapters 12–14 develop the kriging family at the entry-level practitioner-pedagogy level. Chapter 14 specifically treats ordinary kriging with detailed walk-through examples comparing OK weights against SK weights on the same data. The most readable entry-level reference for §5.2.)
Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide (2nd ed.). Oxford University Press. (§IV.1 documents the kt3d program — the canonical GSLIB ordinary-kriging implementation. The kt3d source code is how to verify that a custom OK implementation matches a reference; the §IV.1.5 discussion of the augmented system and the LU solve is the production-engineering reference. The §5.2 widgets here implement the same algorithm at smaller scale.)
Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling (2nd ed.). Oxford University Press. (§4 documents the production kriging workflow as used in reservoir characterisation, emphasising OK as the default. §4.4 covers the local-neighbourhood implementation that production OK relies on, with practical cost-quality tradeoffs.)
Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer. (The rigorous mathematical-statistics treatment. Develops the asymptotic theory of OK under increasing-domain and infill asymptotics, the connection between OK and best linear prediction in general Hilbert spaces, and the consistency conditions for OK predictors. The reference for the theoretical foundations.)
Wackernagel, H. (2003). Multivariate Geostatistics (3rd ed.). Springer. (§11-12 develops OK and UK within the multivariate framework. Useful for seeing how the §5.7 cokriging system specialises to §5.2 ordinary kriging in the single-variable case, and how UK in §5.3 specialises to OK when the trend has only the constant basis function.)