Cokriging with secondary data

Part 5 — Kriging

Learning objectives

Define the primary and secondary variables, and the cross-variogram γ_{12}(h)
Write the cokriging system as an extension of ordinary kriging with cross-covariance blocks
Apply the Linear Model of Coregionalisation (LMC) for permissible cross-variograms
Recognise when cokriging materially improves over univariate kriging vs when it does not
Distinguish strict cokriging from collocated cokriging (a computationally lighter approximation)

So far Part 5 has used a single variable: porosity, grade, hydraulic head. In practice, samples often include a SECONDARY variable that is correlated with the primary but cheaper or denser to measure. Cokriging incorporates this auxiliary information into the kriging system. When the cross-correlation is strong AND the secondary is densely sampled, cokriging beats univariate kriging.

The cokriging setup

Two variables:

Primary $Z_1(\mathbf{s})$ : the variable to estimate; expensive / sparse samples.
Secondary $Z_2(\mathbf{s})$ : cheaper / denser samples; correlated with $Z_1$ .

Concrete examples:

Petroleum: $Z_1$ = porosity from cores (sparse); $Z_2$ = seismic-derived acoustic impedance (dense).
Mining: $Z_1$ = gold grade (assays, expensive); $Z_2$ = magnetic susceptibility (cheap, dense).
Environmental: $Z_1$ = soil contaminant (lab); $Z_2$ = portable XRF reading (handheld).

The cross-variogram

The cross-variogram captures spatial co-variation:

\gamma_{12}(\mathbf{h}) = \frac{1}{2} E[(Z_1(\mathbf{s} + \mathbf{h}) - Z_1(\mathbf{s})) (Z_2(\mathbf{s} + \mathbf{h}) - Z_2(\mathbf{s}))].

Or equivalently in covariance form: $C_{12}(\mathbf{h}) = E[(Z_1(\mathbf{s}) - m_1)(Z_2(\mathbf{s} + \mathbf{h}) - m_2)]$ .

Estimate from data the same way as univariate variograms — lag-binned average of cross-products of differences.

Permissibility: the Linear Model of Coregionalisation (LMC)

The matrix of variograms $[\gamma_{ij}(\mathbf{h})]$ must be conditionally negative-definite for ALL h. The standard way to ensure this: model each component as a NESTED linear combination of the same set of basic variogram structures:

\gamma_{ij}(\mathbf{h}) = \sum_{k=0}^K b_{ij}^{(k)} \gamma^{(k)}(\mathbf{h}),

where each $\gamma^{(k)}$ is a permissible univariate variogram (Spherical, Exponential, etc.) and each $B^{(k)} = [b_{ij}^{(k)}]$ is a positive semi-definite matrix. This is the LMC. It guarantees permissibility of all cross-variograms.

The cokriging system

Block-structured: if $N_1$ primary samples and $N_2$ secondary samples, the cokriging system has size $(N_1 + N_2 + K)$ with K Lagrange multipliers (one per unbiasedness constraint per variable):

\begin{bmatrix} K_{11} & K_{12} & F_1 \\ K_{21} & K_{22} & F_2 \\ F_1^T & F_2^T & 0 \end{bmatrix} \begin{bmatrix} \mathbf{w}_1 \\ \mathbf{w}_2 \\ \boldsymbol{\mu} \end{bmatrix} = \begin{bmatrix} \mathbf{k}_{10} \\ \mathbf{k}_{20} \\ \mathbf{f}_0 \end{bmatrix}

where $K_{ij}$ is the block of (cross-)covariances between primary and secondary samples, $F_i$ enforces unbiasedness on $Z_i$ , and $\mathbf{k}_{i0}$ is the (cross-)covariance from samples of variable i to the query.

Collocated cokriging

Full cokriging needs many secondary samples (potentially in the thousands) per estimate — large systems. Collocated cokriging simplifies: at each kriging location, use only the secondary value AT THE QUERY LOCATION plus the primary samples in the neighbourhood. The system size becomes $N_1 + 1 + K$ instead of $N_1 + N_2 + K$ . Loses some information but dramatically faster, and the lost information is usually small when the secondary is densely sampled (close to where the query is).

When does cokriging materially help?

Strong cross-correlation between primary and secondary (ρ > 0.5).
Secondary much denser than primary.
Local correlation pattern (cross-variogram has clear structure).

When these all hold, cokriging can reduce estimation error 20-50% vs univariate kriging. When the cross-correlation is weak (ρ < 0.3) or the secondary is no denser than the primary, the gain is marginal.

Honest caveats

LMC fitting is HARD — requires matching multiple variograms simultaneously while keeping $B^{(k)}$ PSD. Automated tools (e.g., R's gstat) help but can fail; manual tuning is common.
If the primary and secondary have different SUPPORTS (cores vs seismic blocks), cross-variogram interpretation needs change-of-support care.
Cokriging assumes a stationary linear relationship between primary and secondary. Nonlinearity violates this; consider transforms (N-score on both, §1.2) first.

Try it

In the widget, start with zero cross-correlation. Confirm cokriging reduces to univariate kriging (the secondary contributes nothing).
Increase cross-correlation to 0.8. Observe the kriging variance drops — cokriging extracts information from the dense secondary.
Add more secondary samples to a previously sparse secondary. Watch the cokriging variance drop further; univariate kriging variance stays the same.
Toggle "collocated cokriging" mode. Compare to full cokriging: the variance reduction is similar when the secondary is dense, but collocated is much faster.
Set primary samples to sparse and secondary to dense, then increase cross-correlation. Note the variance ratio (cokriging / univariate kriging) drops below 1 when correlation is strong.

A team has dense seismic-impedance data (secondary) and sparse porosity wells (primary). They run cokriging and find the resulting porosity map is essentially the SAME as kriging with primary alone. Diagnose ONE plausible reason and ONE remediation.

What you now know

Cokriging extends ordinary kriging to multiple variables. The Linear Model of Coregionalisation ensures permissible cross-variograms. The cokriging system is block-structured with primary-primary, primary-secondary, and secondary-secondary covariance blocks. Collocated cokriging is the standard computational simplification. Gains over univariate kriging depend on cross-correlation strength AND secondary density. §5.8 closes Part 5 with the common KRIGING PATHOLOGIES — failure modes you've glimpsed throughout Part 5, catalogued and diagnosed.

References

Wackernagel, H. (2003). Multivariate Geostatistics, 3rd ed. Berlin: Springer. (The canonical book on cokriging and the Linear Model of Coregionalisation.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. New York: Oxford University Press. (§6 — multivariate geostatistics with extensive cokriging examples.)
Journel, A.G., Huijbregts, C.J. (1978). Mining Geostatistics. Academic Press. (Chapter 5 introduces cokriging in the mining context; the foundational treatment.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd ed. Wiley. (Chapter 5 — multivariate geostatistics with the LMC formal theory.)
Almeida, A.S., Journel, A.G. (1994). "Joint simulation of multiple variables with a Markov-type coregionalization model." Math. Geol. 26(5), 565–588. (Practical Markov-1 simplification of LMC.)