Nested structures and additive sills

Part 4 — Variogram modeling

Learning objectives

  • State the FUNDAMENTAL RULE for combining variogram models: the SUM of permissible variograms is permissible. If γ1(h)\gamma_1(h) and γ2(h)\gamma_2(h) are both conditionally negative definite, so is γ1(h)+γ2(h)\gamma_1(h) + \gamma_2(h) — a direct corollary of the PSD-sum theorem applied to the matching covariances. SUBTRACTION is NOT generally permissible. This is the lever that lets §4.4 build multi-scale models from the §4.1 canonical list
  • Write the GENERAL NESTED MODEL γ(h)=c01[h>0]+k=1Kckρk(h;ak)\gamma(h) = c_0 \cdot \mathbb{1}[h > 0] + \sum_{k=1}^{K} c_k \cdot \rho_k(h; a_k) where each ρk\rho_k is the §4.1 normalised shape (Spherical, Exponential, or Gaussian) and each component has its own range aka_k and sill contribution ckc_k. The TOTAL SILL is c0+kckc_0 + \sum_k c_k — sills add directly because the matching covariances add directly, and C(0)C(0) is the sum of per-component variances
  • Recognise the TWO-STRUCTURE + NUGGET model — the workhorse form for real data — as γ(h)=c0+c1ρsph(h;a1)+c2ρsph(h;a2)\gamma(h) = c_0 + c_1 \rho_{\mathrm{sph}}(h; a_1) + c_2 \rho_{\mathrm{sph}}(h; a_2) with a2a1a_2 \gg a_1 (typically 5–20× larger). The short-range component captures local heterogeneity; the long-range component captures regional / facies-scale trend; the nugget captures measurement noise + sub-sampling-scale variability. Mining geostatistics has used this form continuously since Krige and Matheron (Journel & Huijbregts 1978 §II.A.4)
  • Read MULTI-SCALE PHYSICAL INTERPRETATIONS from the per-component ranges. Mining: short-range (10–50 m, vein / lithological heterogeneity) + long-range (200–2000 m, regional / structural). Petroleum: short-range (1–10 m, lamination) + long-range (100–1000 m, depositional facies). Hydrology: short-range (local hydraulic conductivity) + long-range (formation-scale trend). Each scale corresponds to a different geological process, and the nested model explicitly separates them
  • Apply the FITTING STRATEGY for nested models (Goovaerts 1997 §4.2; Chilès & Delfiner 2012 §2.5; full machinery in §4.5): (1) identify the break in slope from γ^\hat\gamma; (2) start with nugget c0c_0 from short-lag extrapolation; (3) fit the short-range component to the steep near-origin rise; (4) fit the long-range component to the gentler long-lag climb; (5) iterate. Each component's ckc_k + aka_k is tuned to a different region of the empirical curve
  • Allow PER-COMPONENT ANISOTROPY. Each γ_k can be wrapped in its own anisotropic ellipsoid (§4.3) with its own orientation and ranges. The result is still permissible (sum of permissibles). Different geological processes often have different orientations — short-range component along stratigraphic dip, long-range component along regional strike — and the per-component anisotropy lets the model capture both. ZONAL ANISOTROPY (Goovaerts 1997 §4.3; Wackernagel 2003 §7.5) is a special case: one nested component contributes additional sill in one direction only, with effectively infinite range along the others
  • Recognise the HOLE EFFECT — a non-monotonic γ that DIPS BELOW the sill at intermediate lags before returning. Signals CYCLIC behaviour at intermediate scale (bedding rhythms in sedimentary rocks; periodic deposition cycles). Modelled via a cosine-decay term: γhole(h)=c[1cos(2πh/a)]\gamma_{\mathrm{hole}}(h) = c \cdot [1 - \cos(2\pi h / a)] in 1-D (Chilès & Delfiner 2012 §2.5; Wackernagel 2003 §7.4 catalogue further forms). Permissibility is delicate — cosine variograms are permissible in 1-D but not in Rd\mathbb{R}^d for d2d \ge 2 without dampening. Uncommon but worth flagging when γ^\hat\gamma shows a clear dip
  • Apply the HONEST CAVEATS: STOP at 2-3 structures. Four or more components usually means OVER-FITTING the noise in γ^\hat\gamma rather than identifying real physical processes. Per-component ranges become UNRESOLVED — limited pair counts cannot separately constrain two components whose ranges differ by less than ~3×. Each added component adds 2–3 parameters; N(hk)N(h_k) per bin caps the parameter-count budget. The principle: simpler nested models with well-separated ranges generalise better than baroque multi-component fits
  • Locate §4.4 honestly in Part 4: §4.1 gave permissible isotropic families; §4.2 added the nugget; §4.3 wrapped the families in an anisotropic ellipsoid; §4.4 sums them. §4.5 covers the formal FITTING procedures — by eye, by weighted least squares, by maximum likelihood — that pin down the per-component parameters jointly. Part 5 plugs the (possibly anisotropic, possibly nested) model into the kriging system. The §4.4 sum-of-permissible rule is what makes the practitioner-standard multi-scale modelling defensible

By §4.3 you can build any anisotropic variogram from a permissible isotropic family — Spherical, Exponential, or Gaussian, with or without a nugget, wrapped in an anisotropy ellipsoid that captures a directional bias. Each of those constructions has ONE structured component. They fit empirical variograms that show a single rise from a positive intercept to a single sill, with one characteristic range. Real earth-science variograms are rarely so clean.

Plot a §3.2 empirical variogram on a campaign with reasonable lag coverage and you almost always see something different: a steep rise from the smallest lag bins, a brief plateau (or change of slope), then a SECOND rise to a higher sill at much longer lags. The eye sees two "knees" in the curve — one near the short lags, one near the long lags — not one. A single Spherical, Exponential, or Gaussian curve cannot fit that shape. Trying produces a compromise: either the model traces the short-range rise and overshoots the high-lag plateau, or it traces the long-range climb and misses the steep near-origin behaviour. Either way the kriging system in Part 5 sees a systematically biased variogram, and the kriged map inherits that bias.

The practitioner-standard fix, in continuous use since Matheron 1971 and codified in Journel & Huijbregts 1978 §II.A.4: model real variograms as SUMS of permissible components. Each component is a permissible family from §4.1 with its own sill and range; the components capture different spatial scales; their sum is itself permissible (Cressie 1993 §2.5 documents the permissibility-preserving property of variogram-sum operations); and the resulting model fits the multi-knee empirical shape that no single family can match. §4.4 develops this NESTED-STRUCTURES machinery: the sum-of-permissibles theorem, the additive-sill arithmetic, the practitioner two-structure-plus-nugget form, the per-component anisotropy extension, the hole-effect special case, the fitting strategy preview, and the over-fitting caveats. By the end of the section you can build a defensible multi-scale variogram model from any combination of §4.1 families.

The fundamental rule: sum of permissibles is permissible

The §4.1 permissibility constraint (a variogram must be conditionally negative definite, equivalently its matching covariance must be positive semi-definite) is preserved under the simplest operation on variograms — addition. Suppose γ1(h)\gamma_1(h) and γ2(h)\gamma_2(h) are both permissible. Then for any finite sample configuration s1,,sn\mathbf{s}_1, \ldots, \mathbf{s}_n and any weight vector λ\boldsymbol\lambda with iλi=0\sum_i \lambda_i = 0, the CND double sum for the SUM γ=γ1+γ2\gamma = \gamma_1 + \gamma_2 is

i,jλiλjγ(sisj)=i,jλiλjγ1(sisj)+i,jλiλjγ2(sisj).\sum_{i,j} \lambda_i \lambda_j \, \gamma(\|\mathbf{s}_i - \mathbf{s}_j\|) = \sum_{i,j} \lambda_i \lambda_j \, \gamma_1(\|\mathbf{s}_i - \mathbf{s}_j\|) + \sum_{i,j} \lambda_i \lambda_j \, \gamma_2(\|\mathbf{s}_i - \mathbf{s}_j\|).

Each term on the right is 0\le 0 (by the assumed permissibility of γ1\gamma_1 and γ2\gamma_2), so the left side is 0\le 0 as well: the sum γ1+γ2\gamma_1 + \gamma_2 is conditionally negative definite. Equivalently in covariance form: if C1C_1 and C2C_2 are PSD, the sum C1+C2C_1 + C_2 is PSD (the matrix Loewner order is closed under addition). Cressie 1993 §2.5 gives the formal statement; Christakos 1984 lays out the broader catalogue of permissibility-preserving operations.

Three corollaries pin down what nested modelling can and cannot do:

  • SUM of any finite number of permissibles is permissible — by induction on the two-term case. So γ=c01[h>0]+γSph(h;a1)+γExp(h;a2)\gamma = c_0 \cdot \mathbb{1}[h>0] + \gamma_{\mathrm{Sph}}(h; a_1) + \gamma_{\mathrm{Exp}}(h; a_2) (nugget + Spherical + Exponential) is permissible, and so are arbitrary K-component sums.
  • POSITIVE-SCALAR multiplication is permissible: cγ(h)c \cdot \gamma(h) is CND if γ\gamma is and c0c \ge 0. This is what lets the per-component sill ckc_k be a free parameter.
  • SUBTRACTION is NOT generally permissible. The difference of two permissibles can violate CND on some sample configurations. The novice mistake of "fitting by drawing whatever curve looks reasonable" — which implicitly subtracts pieces of permissibles from one another — almost always produces a non-permissible curve. The rule is "add canonical families together", never "subtract bits off".

The general nested model — and its additive sill

The general nested variogram is

γ(h)  =  c01[h>0]  +  k=1Kckρk(h;ak)\gamma(h) \;=\; c_0 \cdot \mathbb{1}[h > 0] \;+\; \sum_{k=1}^{K} c_k \cdot \rho_k(h; a_k)

where each ρk(h;ak)\rho_k(h; a_k) is one of the §4.1 normalised shapes:

  • Spherical: ρSph(h;a)=1.5(h/a)0.5(h/a)3\rho_{\mathrm{Sph}}(h; a) = 1.5,(h/a) - 0.5,(h/a)^3 for hah \le a, else 11;
  • Exponential: ρExp(h;a)=1exp(h/a)\rho_{\mathrm{Exp}}(h; a) = 1 - \exp(-h/a);
  • Gaussian: ρGau(h;a)=1exp((h/a)2)\rho_{\mathrm{Gau}}(h; a) = 1 - \exp(-(h/a)^2).

Each component contributes a sill of ckc_k; the nugget contributes c0c_0. The TOTAL SILL — the asymptotic value γ(h)\gamma(h) approaches at large hh — is

total sill  =  c0+k=1Kck.\text{total sill} \;=\; c_0 + \sum_{k=1}^{K} c_k.

Sills add directly because the matching COVARIANCES add directly. The covariance corresponding to the nested variogram is

C(h)  =  (c0+kck)γ(h)  =  c01[h=0]+kck(1ρk(h;ak)).C(h) \;=\; \left(c_0 + \sum_k c_k\right) - \gamma(h) \;=\; c_0 \cdot \mathbb{1}[h = 0] + \sum_k c_k \cdot (1 - \rho_k(h; a_k)).

The total variance C(0)=c0+kckC(0) = c_0 + \sum_k c_k is partitioned across components: the nugget contributes c0c_0, each structured component kk contributes ckc_k. This partitioning is exactly the §4.2 nugget interpretation lifted up to multiple structured components — each component is "one nested structure" with its own sill and range.

The fitted parameters of the nested model are (c0,c1,a1,c2,a2,,cK,aK)(c_0, c_1, a_1, c_2, a_2, \ldots, c_K, a_K): 1+2K1 + 2K numbers, plus any per-component anisotropy parameters in the multi-dimensional case (§4.3). With three families to choose from for each component, plus the family-choice itself, the parameter space is genuinely large. §4.5 covers the formal fitting strategies; the §4.4 widgets give you the geometric intuition behind each parameter.

Build a nested γ(h) interactively — first widget

The first widget for §4.4 lets you build a two-structure-plus-nugget variogram by setting each component's family, sill, and range. The display shows the per-component dashed curves (one for each structured γk(h)\gamma_k(h)), an orange dot + horizontal stub at the nugget c0c_0, and a bold solid curve for the summed total γ(h)=c0+γ1(h)+γ2(h)\gamma(h) = c_0 + \gamma_1(h) + \gamma_2(h). A side readout reports the total sill (as a sum of contributions), the c₀/(total) ratio, and the scale gap a2/a1a_2 / a_1.

Nested Structure BuilderInteractive figure — enable JavaScript to interact.

Three things to do with this widget. First, set nugget c0=0.05c_0 = 0.05, then component 1 to Spherical with c1=0.30c_1 = 0.30 and a1=0.15a_1 = 0.15, and component 2 to Spherical with c2=0.50c_2 = 0.50 and a2=0.70a_2 = 0.70. The two dashed curves climb at different scales — the short-range Spherical rises steeply and plateaus at h=a1=0.15h = a_1 = 0.15, the long-range Spherical rises more gently and plateaus at h=a2=0.70h = a_2 = 0.70. The SUM (solid yellow) is what a real empirical variogram with two structures would look like: a steep near-origin rise (driven by the short-range component plus the nugget), a brief plateau / shoulder around ha1h \approx a_1, then a second slower rise that levels off at ha2h \approx a_2. Total sill = 0.05 + 0.30 + 0.50 = 0.85.

Second, slide a2a_2 down toward a1a_1. Watch what happens to the summed total. At a22a1a_2 \approx 2,a_1 the two component knees start to merge into a single rounded shoulder; at a2a1a_2 \approx a_1 the sum looks like a single Spherical at the shared range with the combined sill c1+c2c_1 + c_2. THIS is why nested-structure resolvability needs a scale gap: with a2/a1<3a_2 / a_1 < 3 the two components are not separable from the binned empirical variogram alone. The §4.4 honest caveat: practitioner nested models almost always have a_2/a_1 in 5–20×, deliberately chosen well past the resolvability threshold.

Third, change component 1's family to Gaussian and watch the curve near the origin. The Gaussian ρ\rho rises with zero tangent and curls upward parabolically (§4.1 — the parabolic near-origin behaviour). The total γ(h) inherits that smoothness near h=0h = 0: instead of leaving the c₀ stub with a linear lift-off (Spherical / Exponential short range), it leaves the stub with a parabolic lift-off. This is how the per-component family choice controls the near-origin smoothness of the COMPOUND model. The §4.4 widget makes that geometry visible.

The two-structure + nugget model — the workhorse form

The general nested form with KK components is mathematically clean but most real-data fits use the simplest non-trivial case: K=2K = 2. The TWO-STRUCTURE + NUGGET model

γ(h)  =  c01[h>0]  +  c1ρSph(h;a1)  +  c2ρSph(h;a2),\gamma(h) \;=\; c_0 \cdot \mathbb{1}[h > 0] \;+\; c_1 \cdot \rho_{\mathrm{Sph}}(h; a_1) \;+\; c_2 \cdot \rho_{\mathrm{Sph}}(h; a_2),

with a2a1a_2 \gg a_1, is the dominant form in published mining variographies and reservoir-characterisation work. It has six parameters (c0,c1,a1,c2,a2c_0, c_1, a_1, c_2, a_2 plus family choice), which is at the upper end of what limited-pair-count empirical variograms can reliably constrain. Adding a third structured component (K=3K = 3) requires either denser sampling or a strong physical prior on the third range; without one, the per-component parameters become non-identifiable from data alone.

Why two structured components, not one? Because real geological fields almost always have processes at multiple scales acting simultaneously:

  • Mining grades — short-range (10–50 m, vein-scale heterogeneity from individual mineralisation events; the high-grade pods that drive grade-control campaigns) plus long-range (200–2000 m, regional / structural; the larger orebody envelope and its grade trend). The two scales are physically distinct: short-range captures mineralogical micro-controls, long-range captures the ore-body geometry. Journel & Huijbregts 1978 §II.A.4 documents this two-scale pattern as the modal form fitted in the original Fontainebleau-tradition mining campaigns.
  • Petroleum reservoirs — short-range (1–10 m, lamination-scale heterogeneity within a single bedding unit; the bedform-scale porosity variability) plus long-range (100–1000 m, depositional-facies-scale; the channel-bar, channel-fill, levee morphologies of a fluvial system). Pyrcz & Deutsch 2014 §5 documents typical reservoir analogue values; the short-range / long-range ratio sits in 10–100× in published examples.
  • Hydrology — short-range (local hydraulic conductivity heterogeneity within a single sedimentary unit; tens of metres) plus long-range (formation-scale trend in the depositional environment; hundreds to thousands of metres). The two-scale model lets the kriged map respect both the local conductivity contrasts (drilling targets) and the regional gradient (basin-scale flow direction).
  • Geochemistry / contamination — short-range (local-source heterogeneity; tens of metres around individual spill or seepage points) plus long-range (regional gradient; the diffusive plume on a kilometre scale). Goovaerts 1997 §4.4 develops the nested-structure analysis for soil contamination fields.

Each pair of (short-range, long-range) values reports a different physical story. The total sill c0+c1+c2c_0 + c_1 + c_2 is the total variance of the field; the partition (c0,c1,c2)(c_0, c_1, c_2) tells you what fraction of that variance lives at what scale. A field with c1/(c1+c2)=0.8c_1 / (c_1 + c_2) = 0.8 is "short-range dominated" — most of its variance is at the fine scale, and the kriged map should preserve that fine-scale texture. A field with c2/(c1+c2)=0.8c_2 / (c_1 + c_2) = 0.8 is "long-range dominated" — most of the variance is the broad trend, and the kriged map will look smooth with only modest fine-scale variation. The shape of the kriged map is set by which partition the data implies, and the §4.4 nested model is what makes that partition explicit.

Fitting nested structures — strategy (§4.5 develops the machinery)

The formal fitting machinery — weighted least squares, maximum likelihood, REML — is §4.5's job. The §4.4 STRATEGY for nested models, codified in Goovaerts 1997 §4.2 and Chilès & Delfiner 2012 §2.5, is:

  • Identify breaks in slope. Plot γ^(h)\hat\gamma(h) and look for changes in slope: a steep rise that suddenly slows, or a plateau followed by a second rise. Each break is a candidate component boundary. Two breaks ⇒ two components plus nugget; three breaks ⇒ three components (rare and usually over-fit).
  • Estimate the nugget by short-lag extrapolation. Read the smallest-lag bins and extrapolate the empirical curve back to h=0+h = 0^+. The intercept is c0c_0. Recall the §4.2 caveat: the visible smallest-lag bin under-reads the true nugget by sampling noise, so use the SHAPE of the curve, not a single point.
  • Fit the short-range component to the steep near-origin rise. Subtract c0c_0 from the empirical curve and look at the region between the first lag and the first break. Pick a family (Spherical for finite-range patches, Exponential for continuous rough decorrelation, Gaussian for smooth fields — see §4.1). Set a1a_1 to the lag at the first break / shoulder; set c1c_1 so the component's sill matches the empirical plateau / shoulder height above c0c_0.
  • Fit the long-range component to the residual climb. What remains of the empirical curve above the first plateau is the long-range structure. Pick a family for it (often the same as the short-range family); set a2a_2 to the lag at which the total empirical curve finally levels off; set c2=total sillc0c1c_2 = \text{total sill} - c_0 - c_1 so the modelled total sill matches the empirical asymptotic plateau.
  • Iterate. The component fits are not independent — adjusting a1a_1 changes how much of the empirical curve "belongs" to the long-range component, which changes c2c_2 and a2a_2, which feeds back. Adjust each parameter in turn, watching the residual SSE drop with each adjustment. §4.5 formalises this with weighted least squares; the §4.4 widgets give you the geometric intuition.

The strategy is sequential — fit the nugget first, then the short-range, then the long-range — because each later component refines a residual that the earlier components left behind. This is essentially a coordinate-descent optimisation in parameter space; it converges to a sensible local optimum because the components have well-separated scales (so the parameters are decoupled across scales). §4.5 covers the joint-optimisation alternative and the conditions under which it is worth using.

Stage-by-stage fitting — second widget

The second widget for §4.4 makes the fitting strategy concrete. A synthetic empirical variogram is drawn from a KNOWN nested truth (nugget + short-range Spherical + long-range Spherical). The reader works through three explicit stages: Stage 1 fits the nugget c0c_0 alone; Stage 2 adds a short-range component (c1,a1)(c_1, a_1); Stage 3 adds the long-range component (c2,a2)(c_2, a_2). Per-stage weighted SSE is reported live, so each added component's contribution to fit quality is visible.

Multi Range FittingInteractive figure — enable JavaScript to interact.

Three things to do with this widget. First, at Stage 1, slide c0fitc_0^{\mathrm{fit}} until the dashed fit line's intercept matches the smallest-lag binned points. Stage-1 SSE will be large (the model is a flat line at c0c_0 while the bins climb past it). Click Stage 2: now the dashed curve has an elbow at a1fita_1^{\mathrm{fit}} instead of being flat. Adjust a1fita_1^{\mathrm{fit}} and c1fitc_1^{\mathrm{fit}} until the fit traces the bins through the first break. Stage-2 SSE has dropped from Stage-1 SSE — the short-range component is doing work.

Second, at Stage 3, the long-range component (c2fit,a2fit)(c_2^{\mathrm{fit}}, a_2^{\mathrm{fit}}) enters. Adjust a2fita_2^{\mathrm{fit}} to match the location of the final empirical plateau; adjust c2fitc_2^{\mathrm{fit}} so the fit's asymptotic level matches the empirical sill. Stage-3 SSE drops again — each component's addition cuts residual variance. The percent-of-variance-explained readout for each stage tells you, quantitatively, how much of the binned-empirical signal each component captures. Toggle "Reveal truth" to compare the fit to the generator. Click "Resample noise" to see how the per-stage SSE drops behave under different noise realisations of the same truth.

Third, deliberately TURN OFF the long-range fit (set c2fit=0c_2^{\mathrm{fit}} = 0) and look at the Stage-3 SSE. It is no better than Stage 2 — without the second component, the model cannot capture the long-lag climb, so Stage 3 collapses to Stage 2 in fit quality. The point of nested modelling is precisely that each component adds explanatory power; remove any one and the residual variance the others can't reach stays in the residual SSE. This is the empirical justification for nested structures: real empirical variograms have multi-scale structure, and a single-family model leaves systematic residuals that a nested model resolves.

Per-component anisotropy and zonal anisotropy

The §4.4 sum-of-permissibles theorem extends naturally to the anisotropic case from §4.3. Each nested component can wrap its own permissible family inside its OWN anisotropy ellipsoid:

γ(h)  =  c01[h>0]  +  k=1Kckρk ⁣(Akh;1),\gamma(\mathbf{h}) \;=\; c_0 \cdot \mathbb{1}[\|\mathbf{h}\| > 0] \;+\; \sum_{k=1}^{K} c_k \cdot \rho_k\!\left(\|\mathbf{A}_k\,\mathbf{h}\|; 1\right),

with a different anisotropy matrix Ak\mathbf{A}_k for each component. The construction stays permissible (each per-component term is permissible by the §4.3 wrap-in-A construction; the sum is permissible by the §4.4 sum theorem). Each Ak\mathbf{A}_k encodes its own orientation and per-direction ranges; the components can have entirely different anisotropies, capturing physically different processes.

Different orientations across components are not just permitted — they're common. In a fluvial reservoir, the SHORT-RANGE component often captures lamination-scale heterogeneity oriented along bedding dip (a few degrees off horizontal), while the LONG-RANGE component captures channel-scale heterogeneity oriented along the paleo-flow direction (the channel's azimuth). The two orientations are physically distinct and the nested anisotropy model captures both. In a fractured reservoir the short-range component might align with the dominant joint set while the long-range component aligns with the regional structural fabric. Pyrcz & Deutsch 2014 §5 documents the per-component-anisotropy modelling practice in reservoir-characterisation workflows.

Zonal anisotropy — the case §4.3 deferred to §4.4 — fits naturally here. A zonal-anisotropy component contributes EXTRA sill in one direction only, with effectively infinite range along the others. In 2-D, model it as a nested component with amaja_{\mathrm{maj}} very large (so the component barely starts climbing along the major direction over the data's lag range) and amina_{\mathrm{min}} small (so the component reaches its sill quickly along the minor direction). The variogram along the major direction sees this component as a slow drift; the variogram along the minor direction sees it as a finite-range structure. The combined model has DIFFERENT effective sills along different directions — the signature of zonal anisotropy in the empirical variogram (Goovaerts 1997 §4.3; Wackernagel 2003 §7.5).

Practical zonal-anisotropy modelling is a balance. Strict zonal anisotropy (infinite range along non-zonal directions) is permissible only as a limit of finite-range components; for numerical stability the practitioner typically fits a long-but-finite range in the non-zonal directions. The resulting nested model has 6–10 free parameters per zonal component plus 6 per geometric component — at the upper end of what limited-pair-count empirical variograms can constrain. Reserve zonal-anisotropy components for cases where the directional empirical variograms clearly plateau at different levels, not for routine geometric-anisotropy fits.

The hole effect — non-monotonic γ for cyclic processes

Most variogram models are monotonic: γ(h)\gamma(h) rises with hh up to a sill and stays there (or, for the power model, keeps rising). Some real fields show a NON-MONOTONIC empirical variogram: γ̂ rises to a peak at some intermediate lag, then DIPS BELOW the apparent sill, then returns to it. The dip is called the HOLE EFFECT (Chilès & Delfiner 2012 §2.5; Journel & Huijbregts 1978 §II.A.4).

The hole effect is a signature of CYCLIC behaviour at intermediate scale. The classic earth-science example is bedded sedimentary rocks: alternating high-permeability and low-permeability layers at a characteristic bedding spacing TT. Two samples separated by h=T/2h = T/2 are in OPPOSITE phases of the cycle (one in a high-perm layer, the other in a low-perm layer) — maximally different. Two samples separated by h=Th = T are in the SAME phase — more similar than the h=T/2h = T/2 pair, so γ at h=Th = T is LOWER than γ at h=T/2h = T/2. The variogram traces this in-phase / out-of-phase oscillation, dipping at h=Th = T before returning toward the sill.

The simplest hole-effect variogram is the COSINE-DECAY form

γhole(h)  =  c[1cos ⁣(2πha)](undamped, 1-D only).\gamma_{\mathrm{hole}}(h) \;=\; c \cdot \left[1 - \cos\!\left(\dfrac{2\pi h}{a}\right)\right] \quad \text{(undamped, 1-D only).}

This form is permissible in R1\mathbb{R}^1 but NOT in Rd\mathbb{R}^d for d2d \ge 2 — the cosine covariance is positive semi-definite on the line but not on the plane (Chilès & Delfiner 2012 §2.5 gives the proof via Bochner's theorem on the spectral measure). For practical 2-D / 3-D use, dampened forms like γ(h)=c[1exp(h/a)cos(2πh/T)]\gamma(h) = c \cdot [1 - \exp(-h/a) \cos(2\pi h / T)] (Wackernagel 2003 §7.4 catalogues several) are permissible across dimensions, at the cost of additional parameters.

Hole-effect models are uncommon in routine practice. They require (1) a real cyclic structure in the field, (2) sufficient lag coverage at and beyond the cycle period to see the dip, and (3) the dampened multi-dimensional form to be permissible. When all three conditions hold — bedded sedimentary geology, dense sampling, and 2-D / 3-D modelling — they're a worthwhile extension. Most variogram reports default to monotonic permissible families and flag the hole effect as something to look for in the empirical curve but rarely fit explicitly.

Honest caveats — stop at 2-3, watch for over-fitting

Nested-structure modelling has known failure modes. The §4.4 honest caveats:

Stop at 2-3 components. The general nested form admits KK components with no upper limit. Most published variogram fits in mining and reservoir geostatistics use K=1K = 1 (single structured component plus nugget) or K=2K = 2. K=3K = 3 appears occasionally in fields with three clearly-separated physical scales (e.g., bedding + bar + channel in a fluvial reservoir). K4K \ge 4 almost always means OVER-FITTING the noise in γ^\hat\gamma rather than identifying real spatial processes. The empirical variogram has 1225\sim 12-25 bins of useful data; fitting six free parameters from one component plus three from each additional component eats up the degrees of freedom rapidly. A defensible fit prefers K=1K = 1 or K=2K = 2 and only reaches for K=3K = 3 with a strong physical motivation.

Per-component ranges are not always resolvable. Even with two components, the data must support the scale separation. If a2/a1<3a_2 / a_1 < 3, the binned empirical variogram cannot distinguish two components from one component at the geometric-mean range. The pair-count budget N(hk)N(h_k) per bin caps the resolution: a fit with a1=100a_1 = 100 m and a2=150a_2 = 150 m on a campaign with 8 lag bins is identifying parameters from one component, not two. The practitioner rule of thumb (Pyrcz & Deutsch 2014 §5; Goovaerts 1997 §4.2): fit nested structures only when a2/a15a_2 / a_1 \ge 5. The §4.4 widgets visualise this resolvability threshold — set a21.5a1a_2 \approx 1.5,a_1 and watch the two component curves overlap so closely that the summed total looks indistinguishable from a single Spherical at the joint range.

The nugget is always present. Every nested model includes c0c_0 as the first term — even if you fit c0=0c_0 = 0, the parameter is in the model. Treating the nugget as one of the K+1K + 1 terms in γ=c0+k=1Kckρk\gamma = c_0 + \sum_{k=1}^{K} c_k \rho_k is the consistent practice. The §4.2 honest caveats about measurement error vs microscale variability extend to the nested case: the c0c_0 in a nested model can still arise from either source, and the kriging-mode choice (denoise vs interpolate) still has to be made.

Component family choice is constrained by the near-origin behaviour. The §4.1 origin-behaviour theorem (Stein 1999 §2.7) connects near-origin variogram smoothness to process smoothness. In a nested model the SHORTEST-RANGE component dominates near-origin behaviour — its near-origin shape is the field's effective smoothness. If the short-range component is Gaussian (parabolic origin), the field is effectively differentiable; if it is Spherical or Exponential (linear origin), the field is continuous-but-not-differentiable. The long-range component contributes essentially nothing to the very-near-origin behaviour because ρk(h;a2)\rho_k(h; a_2) for a2a1a_2 \gg a_1 is approximately linear-in-h at small h regardless of family. So the family choice for the LONG-range component is less consequential than the choice for the SHORT-range; spend the modelling effort where it matters.

Cross-validation provides a sanity check. The defensible practice for nested-model selection is to fit K=1K = 1 and K=2K = 2 (and possibly K=3K = 3) versions of the model and compare them via leave-one-out cross-validation on the data (Goovaerts 1997 §6.5). Each version produces a kriged prediction at each held-out sample location; the prediction errors are compared. A K=2 model that beats K=1 by a meaningful margin in cross-validation MSE justifies the added parameters; one that doesn't is over-fitting. §4.5 develops the formal model-selection workflow.

Try it

  • In nested-structure-builder, set c0=0.05c_0 = 0.05, component 1 to Spherical with c1=0.30c_1 = 0.30 and a1=0.15a_1 = 0.15, component 2 to Spherical with c2=0.50c_2 = 0.50 and a2=0.75a_2 = 0.75. Read the total sill (0.85) and the scale gap (5×). The summed γ(h) shows a steep near-origin rise, a shoulder near a1a_1, and a slower rise to the sill near a2a_2. Compute the variance partition: nugget 5/85=6%5/85 = 6%, short-range 30/85=35%30/85 = 35%, long-range 50/85=59%50/85 = 59% — a long-range-dominated field.
  • Still in nested-structure-builder, slide a2a_2 down from 0.75 to 0.20 (closer to a1a_1). Watch the two dashed component curves overlap progressively, and the summed total morph from a clear two-knee shape into a single rounded shoulder. At a2/a11.3a_2 / a_1 \approx 1.3, the bins cannot tell the difference between this nested model and a single Spherical with c=c1+c2=0.80c = c_1 + c_2 = 0.80 at the geometric-mean range. This is the resolvability threshold.
  • In nested-structure-builder, change component 1 to Gaussian and look at the near-origin region. The summed γ(h) now leaves the c0c_0 stub with a parabolic lift-off instead of the linear lift-off of a Spherical short-range. Recall §4.1 — Gaussian without a nugget is dangerous (ill-conditioning). Slide c0c_0 up to 0.05 and the Gaussian-component-induced ill-conditioning is regularised; the model is now usable.
  • In multi-range-fitting, start at Stage 1 with the default truth (c₀=0.10, c₁=0.30, a₁=0.10, c₂=0.45, a₂=0.55). Slide c0fitc_0^{\mathrm{fit}} to match the visible smallest-lag bins (≈0.10–0.13). Note the Stage-1 wSSE. Now click Stage 2 and adjust (c1fit,a1fit)(c_1^{\mathrm{fit}}, a_1^{\mathrm{fit}}) to trace the steep near-origin rise. The Stage-2 wSSE drops sharply — the short-range component is doing most of the work fitting the bins below h0.20h \approx 0.20.
  • In multi-range-fitting at Stage 3, adjust (c2fit,a2fit)(c_2^{\mathrm{fit}}, a_2^{\mathrm{fit}}) to trace the gentler long-range climb. The Stage-3 wSSE drops further. Toggle "Reveal truth" to compare your three-stage fit to the generating curve. How close did you get? The "% variance explained" readout for each stage tells you the relative contribution of each component to the total fit quality.
  • In multi-range-fitting, deliberately set c2fit=0c_2^{\mathrm{fit}} = 0 at Stage 3 (turn off the long-range component). The Stage-3 wSSE no longer improves on Stage 2 — without the second structured component, there is no further fit improvement available. This is the empirical justification for nesting: each component adds explanatory power that is unavailable to the simpler model.
  • Without coding: a reservoir-characterisation team reports a nested variogram fit "nugget 0.04, Spherical short-range c₁=0.18 at a₁=12 m, Spherical long-range c₂=0.32 at a₂=180 m". Compute the total sill, the variance partition across the three terms, and the scale gap a2/a1a_2/a_1. Classify the model: nugget-heavy or signal-heavy? Short-range or long-range dominated? Is the scale gap above or below the resolvability threshold?
  • Without coding: a mining-grade variogram shows a clear empirical dip — γ̂ rises to a peak near h=25h = 25 m, dips to a local minimum near h=50h = 50 m, and rises again toward a sill near h=100h = 100 m. Diagnose. Is this a candidate hole-effect model? What physical process could produce a cycle at 50\sim 50 m? Would you fit the cosine-decay form, or look for a confounder (sampling-design artefact, mis-specified bin centres) before committing to a hole-effect interpretation?

Pause and reflect: nested structures combine multiple permissible variograms into a single multi-scale model. The total sill is the sum of per-component contributions, and the data must support the scale separation for the components to be resolvable. What would you most want to document in a defensible variogram report about a nested model — the per-component family choices, the scale ratios, the variance partitions, the cross-validation evidence — so that a downstream geomodeller can verify the model was justified by the data rather than over-fit to noise?

What you now know — and what Part 4 will build on it

You can state the sum-of-permissibles theorem: if γ1\gamma_1 and γ2\gamma_2 are permissible (conditionally negative definite), so is γ1+γ2\gamma_1 + \gamma_2. You can write the general nested model γ(h)=c01[h>0]+k=1Kckρk(h;ak)\gamma(h) = c_0 \cdot \mathbb{1}[h>0] + \sum_{k=1}^K c_k , \rho_k(h; a_k) and compute its total sill as the sum of per-component contributions. You know that subtraction is NOT permissibility-preserving and that "fitting by drawing whatever curve looks reasonable" almost always violates CND somewhere.

You can build the two-structure-plus-nugget model — the practitioner workhorse — from a pair of §4.1 families plus a nugget, and you can interpret each component's sill and range in terms of a physical process operating at that scale. You know the typical scale ratios (5–20× in mining, 10–100× in fluvial reservoirs) and the resolvability rule of thumb (a2/a15a_2 / a_1 \ge 5 for the components to be separable from binned empirical data). You can apply the sequential fitting strategy — nugget by extrapolation, then short-range to the steep near-origin rise, then long-range to the residual climb, then iterate — and you understand why each component's contribution to fit quality is visible in the per-stage SSE drop.

You can extend the nested model to the anisotropic case from §4.3: each component carries its own anisotropy ellipsoid, the per-component orientations are independent, and the sum is still permissible. You can describe zonal anisotropy as a nested component with very long range along non-zonal directions and short range along the zonal direction. You know that the hole-effect cosine-decay model exists for cyclic processes (bedded sediments) but is permissible only in 1-D in its undamped form and requires dampening for multi-dimensional use.

You can apply the honest caveats: stop at 2-3 components; per-component ranges are not resolvable below a scale ratio of about 3-5; the nugget is always one component; the short-range component dominates near-origin smoothness; cross-validation is the defensible model-selection criterion. You understand why baroque nested models with K4K \ge 4 usually over-fit and why simpler well-separated nested models generalise better.

Part 4 closes here with §4.5 — FITTING STRATEGIES — which develops the formal machinery: by-eye fitting (§4.4 strategy made rigorous), by weighted least squares (the field standard), by maximum likelihood and REML (the statistician's preference). §4.5 also covers cross-validation as the model-selection criterion that compares nested models of different complexity. Part 5 then plugs the fitted nested model into the kriging system: the (possibly anisotropic, possibly nested) γ(h) becomes the entries of the kriging matrix, with per-component anisotropy applied to each lag distance separately and the search-ellipse machinery from §4.3 selecting the local neighbourhood.

The §4.4 sum-of-permissibles rule is the lever. Real variograms have multi-scale structure. A single-family model can only ever capture one scale. Nested models, built from sums of permissibles with carefully chosen scale gaps, capture the full multi-scale story while preserving the permissibility constraint that makes kriging's guarantees hold. Get the nesting right and the kriged map respects every scale of the field's variability simultaneously.

References

  • Matheron, G. (1971). The Theory of Regionalized Variables and Its Applications. Les Cahiers du Centre de Morphologie Mathématique, No. 5. Fontainebleau, France: École Nationale Supérieure des Mines de Paris. (The foundational reference. Establishes the permissibility constraint for the variogram and the sum-preserving property under which nested structures are themselves permissible. Lays out the §II.B framework for nested-component decomposition that subsequent textbooks elaborate.)
  • Journel, A.G., Huijbregts, C.J. (1978). Mining Geostatistics. Academic Press. (§II.A.4 is the canonical practitioner reference for nested-structure modelling. Documents the two-structure-plus-nugget workhorse form for South African gold and other base-metal deposits, the per-component physical interpretations, and the by-eye fitting strategy that subsequent textbooks formalise. The book that established nested structures as the default form in mining-geostatistics practice.)
  • Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§4.2 develops the nested-structure machinery for environmental and natural-resources applications. §4.3 covers zonal anisotropy as a nested component with very long range along the non-zonal directions. §6.5 develops cross-validation as the model-selection criterion. The standard practitioner-oriented reference for the §4.4 material at a graduate-school level.)
  • Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (§2.5 covers nested-structure modelling and the sum-of-permissibles theorem at a mathematical-statistics level of rigour. The hole-effect cosine-decay model and its dimension-dependent permissibility are developed in this section. The cleanest derivation of why CND is preserved under variogram sums.)
  • Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (§2.5 develops the permissibility-preserving operations on variograms, including the sum theorem and the catalogue of constructions that preserve CND. The cleanest mathematical-statistics treatment of why nested structures stay permissible.)
  • Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide (2nd ed.). Oxford University Press. (§III.4 documents how nested variogram models are encoded and consumed in GSLIB. Each nested component is specified separately (family, sill, range, optional per-component anisotropy) and summed inside the kriging routines. The canonical software-side reference; reading it is the way to verify that a nested-model fit can be re-implemented identically in another package.)
  • Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling (2nd ed.). Oxford University Press. (§5 catalogues nested-structure parameters across reservoir analogue datasets — fluvial, deltaic, carbonate. Documents the typical scale ratios (10-100× horizontal-to-vertical, 5-20× across structured components) and the per-component-anisotropy practice for capturing distinct depositional processes within a single reservoir model.)
  • Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 16 introduces nested-structure modelling at an entry level. The two-structure-plus-nugget Spherical example is the canonical first nested model that subsequent textbooks generalise. The book's pedagogy on by-eye fitting strategy informs the multi-range-fitting widget design.)
  • Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications (3rd ed.). Springer. (§7.4 catalogues additional permissible variogram families including damped hole-effect forms. §7.5 develops zonal anisotropy as a nested-structure construction and connects it to the broader linear-coregionalisation framework that the §4.4 machinery generalises to multivariate variograms.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.