Nested structures and additive sills

Part 4 — Variogram modeling

Learning objectives

State the FUNDAMENTAL RULE for combining variogram models: the SUM of permissible variograms is permissible. If $\gamma_1(h)$ and $\gamma_2(h)$ are both conditionally negative definite, so is $\gamma_1(h) + \gamma_2(h)$ — a direct corollary of the PSD-sum theorem applied to the matching covariances. SUBTRACTION is NOT generally permissible. This is the lever that lets §4.4 build multi-scale models from the §4.1 canonical list
Write the GENERAL NESTED MODEL $\gamma(h) = c_0 \cdot \mathbb{1}[h > 0] + \sum_{k=1}^{K} c_k \cdot \rho_k(h; a_k)$ where each $\rho_k$ is the §4.1 normalised shape (Spherical, Exponential, or Gaussian) and each component has its own range $a_k$ and sill contribution $c_k$ . The TOTAL SILL is $c_0 + \sum_k c_k$ — sills add directly because the matching covariances add directly, and $C(0)$ is the sum of per-component variances
Recognise the TWO-STRUCTURE + NUGGET model — the workhorse form for real data — as $\gamma(h) = c_0 + c_1 \rho_{\mathrm{sph}}(h; a_1) + c_2 \rho_{\mathrm{sph}}(h; a_2)$ with $a_2 \gg a_1$ (typically 5–20× larger). The short-range component captures local heterogeneity; the long-range component captures regional / facies-scale trend; the nugget captures measurement noise + sub-sampling-scale variability. Mining geostatistics has used this form continuously since Krige and Matheron (Journel & Huijbregts 1978 §II.A.4)
Read MULTI-SCALE PHYSICAL INTERPRETATIONS from the per-component ranges. Mining: short-range (10–50 m, vein / lithological heterogeneity) + long-range (200–2000 m, regional / structural). Petroleum: short-range (1–10 m, lamination) + long-range (100–1000 m, depositional facies). Hydrology: short-range (local hydraulic conductivity) + long-range (formation-scale trend). Each scale corresponds to a different geological process, and the nested model explicitly separates them
Apply the FITTING STRATEGY for nested models (Goovaerts 1997 §4.2; Chilès & Delfiner 2012 §2.5; full machinery in §4.5): (1) identify the break in slope from $\hat\gamma$ ; (2) start with nugget $c_0$ from short-lag extrapolation; (3) fit the short-range component to the steep near-origin rise; (4) fit the long-range component to the gentler long-lag climb; (5) iterate. Each component's $c_k$ + $a_k$ is tuned to a different region of the empirical curve
Allow PER-COMPONENT ANISOTROPY. Each γ_k can be wrapped in its own anisotropic ellipsoid (§4.3) with its own orientation and ranges. The result is still permissible (sum of permissibles). Different geological processes often have different orientations — short-range component along stratigraphic dip, long-range component along regional strike — and the per-component anisotropy lets the model capture both. ZONAL ANISOTROPY (Goovaerts 1997 §4.3; Wackernagel 2003 §7.5) is a special case: one nested component contributes additional sill in one direction only, with effectively infinite range along the others
Recognise the HOLE EFFECT — a non-monotonic γ that DIPS BELOW the sill at intermediate lags before returning. Signals CYCLIC behaviour at intermediate scale (bedding rhythms in sedimentary rocks; periodic deposition cycles). Modelled via a cosine-decay term: $\gamma_{\mathrm{hole}}(h) = c \cdot [1 - \cos(2\pi h / a)]$ in 1-D (Chilès & Delfiner 2012 §2.5; Wackernagel 2003 §7.4 catalogue further forms). Permissibility is delicate — cosine variograms are permissible in 1-D but not in $\mathbb{R}^d$ for $d \ge 2$ without dampening. Uncommon but worth flagging when $\hat\gamma$ shows a clear dip
Apply the HONEST CAVEATS: STOP at 2-3 structures. Four or more components usually means OVER-FITTING the noise in $\hat\gamma$ rather than identifying real physical processes. Per-component ranges become UNRESOLVED — limited pair counts cannot separately constrain two components whose ranges differ by less than ~3×. Each added component adds 2–3 parameters; $N(h_k)$ per bin caps the parameter-count budget. The principle: simpler nested models with well-separated ranges generalise better than baroque multi-component fits
Locate §4.4 honestly in Part 4: §4.1 gave permissible isotropic families; §4.2 added the nugget; §4.3 wrapped the families in an anisotropic ellipsoid; §4.4 sums them. §4.5 covers the formal FITTING procedures — by eye, by weighted least squares, by maximum likelihood — that pin down the per-component parameters jointly. Part 5 plugs the (possibly anisotropic, possibly nested) model into the kriging system. The §4.4 sum-of-permissible rule is what makes the practitioner-standard multi-scale modelling defensible

By §4.3 you can build any anisotropic variogram from a permissible isotropic family — Spherical, Exponential, or Gaussian, with or without a nugget, wrapped in an anisotropy ellipsoid that captures a directional bias. Each of those constructions has ONE structured component. They fit empirical variograms that show a single rise from a positive intercept to a single sill, with one characteristic range. Real earth-science variograms are rarely so clean.

Plot a §3.2 empirical variogram on a campaign with reasonable lag coverage and you almost always see something different: a steep rise from the smallest lag bins, a brief plateau (or change of slope), then a SECOND rise to a higher sill at much longer lags. The eye sees two "knees" in the curve — one near the short lags, one near the long lags — not one. A single Spherical, Exponential, or Gaussian curve cannot fit that shape. Trying produces a compromise: either the model traces the short-range rise and overshoots the high-lag plateau, or it traces the long-range climb and misses the steep near-origin behaviour. Either way the kriging system in Part 5 sees a systematically biased variogram, and the kriged map inherits that bias.

The practitioner-standard fix, in continuous use since Matheron 1971 and codified in Journel & Huijbregts 1978 §II.A.4: model real variograms as SUMS of permissible components. Each component is a permissible family from §4.1 with its own sill and range; the components capture different spatial scales; their sum is itself permissible (Cressie 1993 §2.5 documents the permissibility-preserving property of variogram-sum operations); and the resulting model fits the multi-knee empirical shape that no single family can match. §4.4 develops this NESTED-STRUCTURES machinery: the sum-of-permissibles theorem, the additive-sill arithmetic, the practitioner two-structure-plus-nugget form, the per-component anisotropy extension, the hole-effect special case, the fitting strategy preview, and the over-fitting caveats. By the end of the section you can build a defensible multi-scale variogram model from any combination of §4.1 families.

The fundamental rule: sum of permissibles is permissible

The §4.1 permissibility constraint (a variogram must be conditionally negative definite, equivalently its matching covariance must be positive semi-definite) is preserved under the simplest operation on variograms — addition. Suppose $\gamma_1(h)$ and $\gamma_2(h)$ are both permissible. Then for any finite sample configuration $\mathbf{s}_1, \ldots, \mathbf{s}_n$ and any weight vector $\boldsymbol\lambda$ with $\sum_i \lambda_i = 0$ , the CND double sum for the SUM $\gamma = \gamma_1 + \gamma_2$ is

\sum_{i,j} \lambda_i \lambda_j \, \gamma(\|\mathbf{s}_i - \mathbf{s}_j\|) = \sum_{i,j} \lambda_i \lambda_j \, \gamma_1(\|\mathbf{s}_i - \mathbf{s}_j\|) + \sum_{i,j} \lambda_i \lambda_j \, \gamma_2(\|\mathbf{s}_i - \mathbf{s}_j\|).

Each term on the right is $\le 0$ (by the assumed permissibility of $\gamma_1$ and $\gamma_2$ ), so the left side is $\le 0$ as well: the sum $\gamma_1 + \gamma_2$ is conditionally negative definite. Equivalently in covariance form: if $C_1$ and $C_2$ are PSD, the sum $C_1 + C_2$ is PSD (the matrix Loewner order is closed under addition). Cressie 1993 §2.5 gives the formal statement; Christakos 1984 lays out the broader catalogue of permissibility-preserving operations.

Three corollaries pin down what nested modelling can and cannot do:

SUM of any finite number of permissibles is permissible — by induction on the two-term case. So $\gamma = c_0 \cdot \mathbb{1}[h>0] + \gamma_{\mathrm{Sph}}(h; a_1) + \gamma_{\mathrm{Exp}}(h; a_2)$ (nugget + Spherical + Exponential) is permissible, and so are arbitrary K-component sums.
POSITIVE-SCALAR multiplication is permissible: $c \cdot \gamma(h)$ is CND if $\gamma$ is and $c \ge 0$ . This is what lets the per-component sill $c_k$ be a free parameter.
SUBTRACTION is NOT generally permissible. The difference of two permissibles can violate CND on some sample configurations. The novice mistake of "fitting by drawing whatever curve looks reasonable" — which implicitly subtracts pieces of permissibles from one another — almost always produces a non-permissible curve. The rule is "add canonical families together", never "subtract bits off".

The general nested model — and its additive sill

The general nested variogram is

\gamma(h) \;=\; c_0 \cdot \mathbb{1}[h > 0] \;+\; \sum_{k=1}^{K} c_k \cdot \rho_k(h; a_k)

where each $\rho_k(h; a_k)$ is one of the §4.1 normalised shapes:

Spherical: $\rho_{\mathrm{Sph}}(h; a) = 1.5,(h/a) - 0.5,(h/a)^3$ for $h \le a$ , else $1$ ;
Exponential: $\rho_{\mathrm{Exp}}(h; a) = 1 - \exp(-h/a)$ ;
Gaussian: $\rho_{\mathrm{Gau}}(h; a) = 1 - \exp(-(h/a)^2)$ .

Each component contributes a sill of $c_k$ ; the nugget contributes $c_0$ . The TOTAL SILL — the asymptotic value $\gamma(h)$ approaches at large $h$ — is

\text{total sill} \;=\; c_0 + \sum_{k=1}^{K} c_k.

Sills add directly because the matching COVARIANCES add directly. The covariance corresponding to the nested variogram is

C(h) \;=\; \left(c_0 + \sum_k c_k\right) - \gamma(h) \;=\; c_0 \cdot \mathbb{1}[h = 0] + \sum_k c_k \cdot (1 - \rho_k(h; a_k)).

The total variance $C(0) = c_0 + \sum_k c_k$ is partitioned across components: the nugget contributes $c_0$ , each structured component $k$ contributes $c_k$ . This partitioning is exactly the §4.2 nugget interpretation lifted up to multiple structured components — each component is "one nested structure" with its own sill and range.

The fitted parameters of the nested model are $(c_0, c_1, a_1, c_2, a_2, \ldots, c_K, a_K)$ : $1 + 2K$ numbers, plus any per-component anisotropy parameters in the multi-dimensional case (§4.3). With three families to choose from for each component, plus the family-choice itself, the parameter space is genuinely large. §4.5 covers the formal fitting strategies; the §4.4 widgets give you the geometric intuition behind each parameter.

The first widget for §4.4 lets you build a two-structure-plus-nugget variogram by setting each component's family, sill, and range. The display shows the per-component dashed curves (one for each structured $\gamma_k(h)$ ), an orange dot + horizontal stub at the nugget $c_0$ , and a bold solid curve for the summed total $\gamma(h) = c_0 + \gamma_1(h) + \gamma_2(h)$ . A side readout reports the total sill (as a sum of contributions), the c₀/(total) ratio, and the scale gap $a_2 / a_1$ .

Three things to do with this widget. First, set nugget $c_0 = 0.05$ , then component 1 to Spherical with $c_1 = 0.30$ and $a_1 = 0.15$ , and component 2 to Spherical with $c_2 = 0.50$ and $a_2 = 0.70$ . The two dashed curves climb at different scales — the short-range Spherical rises steeply and plateaus at $h = a_1 = 0.15$ , the long-range Spherical rises more gently and plateaus at $h = a_2 = 0.70$ . The SUM (solid yellow) is what a real empirical variogram with two structures would look like: a steep near-origin rise (driven by the short-range component plus the nugget), a brief plateau / shoulder around $h \approx a_1$ , then a second slower rise that levels off at $h \approx a_2$ . Total sill = 0.05 + 0.30 + 0.50 = 0.85.

Second, slide $a_2$ down toward $a_1$ . Watch what happens to the summed total. At $a_2 \approx 2,a_1$ the two component knees start to merge into a single rounded shoulder; at $a_2 \approx a_1$ the sum looks like a single Spherical at the shared range with the combined sill $c_1 + c_2$ . THIS is why nested-structure resolvability needs a scale gap: with $a_2 / a_1 < 3$ the two components are not separable from the binned empirical variogram alone. The §4.4 honest caveat: practitioner nested models almost always have a_2/a_1 in 5–20×, deliberately chosen well past the resolvability threshold.

Third, change component 1's family to Gaussian and watch the curve near the origin. The Gaussian $\rho$ rises with zero tangent and curls upward parabolically (§4.1 — the parabolic near-origin behaviour). The total γ(h) inherits that smoothness near $h = 0$ : instead of leaving the c₀ stub with a linear lift-off (Spherical / Exponential short range), it leaves the stub with a parabolic lift-off. This is how the per-component family choice controls the near-origin smoothness of the COMPOUND model. The §4.4 widget makes that geometry visible.

The two-structure + nugget model — the workhorse form

The general nested form with $K$ components is mathematically clean but most real-data fits use the simplest non-trivial case: $K = 2$ . The TWO-STRUCTURE + NUGGET model

\gamma(h) \;=\; c_0 \cdot \mathbb{1}[h > 0] \;+\; c_1 \cdot \rho_{\mathrm{Sph}}(h; a_1) \;+\; c_2 \cdot \rho_{\mathrm{Sph}}(h; a_2),

with $a_2 \gg a_1$ , is the dominant form in published mining variographies and reservoir-characterisation work. It has six parameters ( $c_0, c_1, a_1, c_2, a_2$ plus family choice), which is at the upper end of what limited-pair-count empirical variograms can reliably constrain. Adding a third structured component ( $K = 3$ ) requires either denser sampling or a strong physical prior on the third range; without one, the per-component parameters become non-identifiable from data alone.

Why two structured components, not one? Because real geological fields almost always have processes at multiple scales acting simultaneously:

Mining grades — short-range (10–50 m, vein-scale heterogeneity from individual mineralisation events; the high-grade pods that drive grade-control campaigns) plus long-range (200–2000 m, regional / structural; the larger orebody envelope and its grade trend). The two scales are physically distinct: short-range captures mineralogical micro-controls, long-range captures the ore-body geometry. Journel & Huijbregts 1978 §II.A.4 documents this two-scale pattern as the modal form fitted in the original Fontainebleau-tradition mining campaigns.
Petroleum reservoirs — short-range (1–10 m, lamination-scale heterogeneity within a single bedding unit; the bedform-scale porosity variability) plus long-range (100–1000 m, depositional-facies-scale; the channel-bar, channel-fill, levee morphologies of a fluvial system). Pyrcz & Deutsch 2014 §5 documents typical reservoir analogue values; the short-range / long-range ratio sits in 10–100× in published examples.
Hydrology — short-range (local hydraulic conductivity heterogeneity within a single sedimentary unit; tens of metres) plus long-range (formation-scale trend in the depositional environment; hundreds to thousands of metres). The two-scale model lets the kriged map respect both the local conductivity contrasts (drilling targets) and the regional gradient (basin-scale flow direction).
Geochemistry / contamination — short-range (local-source heterogeneity; tens of metres around individual spill or seepage points) plus long-range (regional gradient; the diffusive plume on a kilometre scale). Goovaerts 1997 §4.4 develops the nested-structure analysis for soil contamination fields.

Each pair of (short-range, long-range) values reports a different physical story. The total sill $c_0 + c_1 + c_2$ is the total variance of the field; the partition $(c_0, c_1, c_2)$ tells you what fraction of that variance lives at what scale. A field with $c_1 / (c_1 + c_2) = 0.8$ is "short-range dominated" — most of its variance is at the fine scale, and the kriged map should preserve that fine-scale texture. A field with $c_2 / (c_1 + c_2) = 0.8$ is "long-range dominated" — most of the variance is the broad trend, and the kriged map will look smooth with only modest fine-scale variation. The shape of the kriged map is set by which partition the data implies, and the §4.4 nested model is what makes that partition explicit.

Fitting nested structures — strategy (§4.5 develops the machinery)

The formal fitting machinery — weighted least squares, maximum likelihood, REML — is §4.5's job. The §4.4 STRATEGY for nested models, codified in Goovaerts 1997 §4.2 and Chilès & Delfiner 2012 §2.5, is:

Identify breaks in slope. Plot $\hat\gamma(h)$ and look for changes in slope: a steep rise that suddenly slows, or a plateau followed by a second rise. Each break is a candidate component boundary. Two breaks ⇒ two components plus nugget; three breaks ⇒ three components (rare and usually over-fit).
Estimate the nugget by short-lag extrapolation. Read the smallest-lag bins and extrapolate the empirical curve back to $h = 0^+$ . The intercept is $c_0$ . Recall the §4.2 caveat: the visible smallest-lag bin under-reads the true nugget by sampling noise, so use the SHAPE of the curve, not a single point.
Fit the short-range component to the steep near-origin rise. Subtract $c_0$ from the empirical curve and look at the region between the first lag and the first break. Pick a family (Spherical for finite-range patches, Exponential for continuous rough decorrelation, Gaussian for smooth fields — see §4.1). Set $a_1$ to the lag at the first break / shoulder; set $c_1$ so the component's sill matches the empirical plateau / shoulder height above $c_0$ .
Fit the long-range component to the residual climb. What remains of the empirical curve above the first plateau is the long-range structure. Pick a family for it (often the same as the short-range family); set $a_2$ to the lag at which the total empirical curve finally levels off; set $c_2 = \text{total sill} - c_0 - c_1$ so the modelled total sill matches the empirical asymptotic plateau.
Iterate. The component fits are not independent — adjusting $a_1$ changes how much of the empirical curve "belongs" to the long-range component, which changes $c_2$ and $a_2$ , which feeds back. Adjust each parameter in turn, watching the residual SSE drop with each adjustment. §4.5 formalises this with weighted least squares; the §4.4 widgets give you the geometric intuition.

The strategy is sequential — fit the nugget first, then the short-range, then the long-range — because each later component refines a residual that the earlier components left behind. This is essentially a coordinate-descent optimisation in parameter space; it converges to a sensible local optimum because the components have well-separated scales (so the parameters are decoupled across scales). §4.5 covers the joint-optimisation alternative and the conditions under which it is worth using.

The second widget for §4.4 makes the fitting strategy concrete. A synthetic empirical variogram is drawn from a KNOWN nested truth (nugget + short-range Spherical + long-range Spherical). The reader works through three explicit stages: Stage 1 fits the nugget $c_0$ alone; Stage 2 adds a short-range component $(c_1, a_1)$ ; Stage 3 adds the long-range component $(c_2, a_2)$ . Per-stage weighted SSE is reported live, so each added component's contribution to fit quality is visible.

Three things to do with this widget. First, at Stage 1, slide $c_0^{\mathrm{fit}}$ until the dashed fit line's intercept matches the smallest-lag binned points. Stage-1 SSE will be large (the model is a flat line at $c_0$ while the bins climb past it). Click Stage 2: now the dashed curve has an elbow at $a_1^{\mathrm{fit}}$ instead of being flat. Adjust $a_1^{\mathrm{fit}}$ and $c_1^{\mathrm{fit}}$ until the fit traces the bins through the first break. Stage-2 SSE has dropped from Stage-1 SSE — the short-range component is doing work.

Second, at Stage 3, the long-range component $(c_2^{\mathrm{fit}}, a_2^{\mathrm{fit}})$ enters. Adjust $a_2^{\mathrm{fit}}$ to match the location of the final empirical plateau; adjust $c_2^{\mathrm{fit}}$ so the fit's asymptotic level matches the empirical sill. Stage-3 SSE drops again — each component's addition cuts residual variance. The percent-of-variance-explained readout for each stage tells you, quantitatively, how much of the binned-empirical signal each component captures. Toggle "Reveal truth" to compare the fit to the generator. Click "Resample noise" to see how the per-stage SSE drops behave under different noise realisations of the same truth.

Third, deliberately TURN OFF the long-range fit (set $c_2^{\mathrm{fit}} = 0$ ) and look at the Stage-3 SSE. It is no better than Stage 2 — without the second component, the model cannot capture the long-lag climb, so Stage 3 collapses to Stage 2 in fit quality. The point of nested modelling is precisely that each component adds explanatory power; remove any one and the residual variance the others can't reach stays in the residual SSE. This is the empirical justification for nested structures: real empirical variograms have multi-scale structure, and a single-family model leaves systematic residuals that a nested model resolves.

Per-component anisotropy and zonal anisotropy

The §4.4 sum-of-permissibles theorem extends naturally to the anisotropic case from §4.3. Each nested component can wrap its own permissible family inside its OWN anisotropy ellipsoid:

\gamma(\mathbf{h}) \;=\; c_0 \cdot \mathbb{1}[\|\mathbf{h}\| > 0] \;+\; \sum_{k=1}^{K} c_k \cdot \rho_k\!\left(\|\mathbf{A}_k\,\mathbf{h}\|; 1\right),

with a different anisotropy matrix $\mathbf{A}_k$ for each component. The construction stays permissible (each per-component term is permissible by the §4.3 wrap-in-A construction; the sum is permissible by the §4.4 sum theorem). Each $\mathbf{A}_k$ encodes its own orientation and per-direction ranges; the components can have entirely different anisotropies, capturing physically different processes.

Different orientations across components are not just permitted — they're common. In a fluvial reservoir, the SHORT-RANGE component often captures lamination-scale heterogeneity oriented along bedding dip (a few degrees off horizontal), while the LONG-RANGE component captures channel-scale heterogeneity oriented along the paleo-flow direction (the channel's azimuth). The two orientations are physically distinct and the nested anisotropy model captures both. In a fractured reservoir the short-range component might align with the dominant joint set while the long-range component aligns with the regional structural fabric. Pyrcz & Deutsch 2014 §5 documents the per-component-anisotropy modelling practice in reservoir-characterisation workflows.

Zonal anisotropy — the case §4.3 deferred to §4.4 — fits naturally here. A zonal-anisotropy component contributes EXTRA sill in one direction only, with effectively infinite range along the others. In 2-D, model it as a nested component with $a_{\mathrm{maj}}$ very large (so the component barely starts climbing along the major direction over the data's lag range) and $a_{\mathrm{min}}$ small (so the component reaches its sill quickly along the minor direction). The variogram along the major direction sees this component as a slow drift; the variogram along the minor direction sees it as a finite-range structure. The combined model has DIFFERENT effective sills along different directions — the signature of zonal anisotropy in the empirical variogram (Goovaerts 1997 §4.3; Wackernagel 2003 §7.5).

Practical zonal-anisotropy modelling is a balance. Strict zonal anisotropy (infinite range along non-zonal directions) is permissible only as a limit of finite-range components; for numerical stability the practitioner typically fits a long-but-finite range in the non-zonal directions. The resulting nested model has 6–10 free parameters per zonal component plus 6 per geometric component — at the upper end of what limited-pair-count empirical variograms can constrain. Reserve zonal-anisotropy components for cases where the directional empirical variograms clearly plateau at different levels, not for routine geometric-anisotropy fits.

The hole effect — non-monotonic γ for cyclic processes

Most variogram models are monotonic: $\gamma(h)$ rises with $h$ up to a sill and stays there (or, for the power model, keeps rising). Some real fields show a NON-MONOTONIC empirical variogram: γ̂ rises to a peak at some intermediate lag, then DIPS BELOW the apparent sill, then returns to it. The dip is called the HOLE EFFECT (Chilès & Delfiner 2012 §2.5; Journel & Huijbregts 1978 §II.A.4).

The hole effect is a signature of CYCLIC behaviour at intermediate scale. The classic earth-science example is bedded sedimentary rocks: alternating high-permeability and low-permeability layers at a characteristic bedding spacing $T$ . Two samples separated by $h = T/2$ are in OPPOSITE phases of the cycle (one in a high-perm layer, the other in a low-perm layer) — maximally different. Two samples separated by $h = T$ are in the SAME phase — more similar than the $h = T/2$ pair, so γ at $h = T$ is LOWER than γ at $h = T/2$ . The variogram traces this in-phase / out-of-phase oscillation, dipping at $h = T$ before returning toward the sill.

The simplest hole-effect variogram is the COSINE-DECAY form

\gamma_{\mathrm{hole}}(h) \;=\; c \cdot \left[1 - \cos\!\left(\dfrac{2\pi h}{a}\right)\right] \quad \text{(undamped, 1-D only).}

This form is permissible in $\mathbb{R}^1$ but NOT in $\mathbb{R}^d$ for $d \ge 2$ — the cosine covariance is positive semi-definite on the line but not on the plane (Chilès & Delfiner 2012 §2.5 gives the proof via Bochner's theorem on the spectral measure). For practical 2-D / 3-D use, dampened forms like $\gamma(h) = c \cdot [1 - \exp(-h/a) \cos(2\pi h / T)]$ (Wackernagel 2003 §7.4 catalogues several) are permissible across dimensions, at the cost of additional parameters.

Hole-effect models are uncommon in routine practice. They require (1) a real cyclic structure in the field, (2) sufficient lag coverage at and beyond the cycle period to see the dip, and (3) the dampened multi-dimensional form to be permissible. When all three conditions hold — bedded sedimentary geology, dense sampling, and 2-D / 3-D modelling — they're a worthwhile extension. Most variogram reports default to monotonic permissible families and flag the hole effect as something to look for in the empirical curve but rarely fit explicitly.

Honest caveats — stop at 2-3, watch for over-fitting

Nested-structure modelling has known failure modes. The §4.4 honest caveats:

Stop at 2-3 components. The general nested form admits $K$ components with no upper limit. Most published variogram fits in mining and reservoir geostatistics use $K = 1$ (single structured component plus nugget) or $K = 2$ . $K = 3$ appears occasionally in fields with three clearly-separated physical scales (e.g., bedding + bar + channel in a fluvial reservoir). $K \ge 4$ almost always means OVER-FITTING the noise in $\hat\gamma$ rather than identifying real spatial processes. The empirical variogram has $\sim 12-25$ bins of useful data; fitting six free parameters from one component plus three from each additional component eats up the degrees of freedom rapidly. A defensible fit prefers $K = 1$ or $K = 2$ and only reaches for $K = 3$ with a strong physical motivation.

Per-component ranges are not always resolvable. Even with two components, the data must support the scale separation. If $a_2 / a_1 < 3$ , the binned empirical variogram cannot distinguish two components from one component at the geometric-mean range. The pair-count budget $N(h_k)$ per bin caps the resolution: a fit with $a_1 = 100$ m and $a_2 = 150$ m on a campaign with 8 lag bins is identifying parameters from one component, not two. The practitioner rule of thumb (Pyrcz & Deutsch 2014 §5; Goovaerts 1997 §4.2): fit nested structures only when $a_2 / a_1 \ge 5$ . The §4.4 widgets visualise this resolvability threshold — set $a_2 \approx 1.5,a_1$ and watch the two component curves overlap so closely that the summed total looks indistinguishable from a single Spherical at the joint range.

The nugget is always present. Every nested model includes $c_0$ as the first term — even if you fit $c_0 = 0$ , the parameter is in the model. Treating the nugget as one of the $K + 1$ terms in $\gamma = c_0 + \sum_{k=1}^{K} c_k \rho_k$ is the consistent practice. The §4.2 honest caveats about measurement error vs microscale variability extend to the nested case: the $c_0$ in a nested model can still arise from either source, and the kriging-mode choice (denoise vs interpolate) still has to be made.

Component family choice is constrained by the near-origin behaviour. The §4.1 origin-behaviour theorem (Stein 1999 §2.7) connects near-origin variogram smoothness to process smoothness. In a nested model the SHORTEST-RANGE component dominates near-origin behaviour — its near-origin shape is the field's effective smoothness. If the short-range component is Gaussian (parabolic origin), the field is effectively differentiable; if it is Spherical or Exponential (linear origin), the field is continuous-but-not-differentiable. The long-range component contributes essentially nothing to the very-near-origin behaviour because $\rho_k(h; a_2)$ for $a_2 \gg a_1$ is approximately linear-in-h at small h regardless of family. So the family choice for the LONG-range component is less consequential than the choice for the SHORT-range; spend the modelling effort where it matters.

Cross-validation provides a sanity check. The defensible practice for nested-model selection is to fit $K = 1$ and $K = 2$ (and possibly $K = 3$ ) versions of the model and compare them via leave-one-out cross-validation on the data (Goovaerts 1997 §6.5). Each version produces a kriged prediction at each held-out sample location; the prediction errors are compared. A K=2 model that beats K=1 by a meaningful margin in cross-validation MSE justifies the added parameters; one that doesn't is over-fitting. §4.5 develops the formal model-selection workflow.

Try it

In nested-structure-builder, set $c_0 = 0.05$ , component 1 to Spherical with $c_1 = 0.30$ and $a_1 = 0.15$ , component 2 to Spherical with $c_2 = 0.50$ and $a_2 = 0.75$ . Read the total sill (0.85) and the scale gap (5×). The summed γ(h) shows a steep near-origin rise, a shoulder near $a_1$ , and a slower rise to the sill near $a_2$ . Compute the variance partition: nugget $5/85 = 6%$ , short-range $30/85 = 35%$ , long-range $50/85 = 59%$ — a long-range-dominated field.
Still in nested-structure-builder, slide $a_2$ down from 0.75 to 0.20 (closer to $a_1$ ). Watch the two dashed component curves overlap progressively, and the summed total morph from a clear two-knee shape into a single rounded shoulder. At $a_2 / a_1 \approx 1.3$ , the bins cannot tell the difference between this nested model and a single Spherical with $c = c_1 + c_2 = 0.80$ at the geometric-mean range. This is the resolvability threshold.
In nested-structure-builder, change component 1 to Gaussian and look at the near-origin region. The summed γ(h) now leaves the $c_0$ stub with a parabolic lift-off instead of the linear lift-off of a Spherical short-range. Recall §4.1 — Gaussian without a nugget is dangerous (ill-conditioning). Slide $c_0$ up to 0.05 and the Gaussian-component-induced ill-conditioning is regularised; the model is now usable.
In multi-range-fitting, start at Stage 1 with the default truth (c₀=0.10, c₁=0.30, a₁=0.10, c₂=0.45, a₂=0.55). Slide $c_0^{\mathrm{fit}}$ to match the visible smallest-lag bins (≈0.10–0.13). Note the Stage-1 wSSE. Now click Stage 2 and adjust $(c_1^{\mathrm{fit}}, a_1^{\mathrm{fit}})$ to trace the steep near-origin rise. The Stage-2 wSSE drops sharply — the short-range component is doing most of the work fitting the bins below $h \approx 0.20$ .
In multi-range-fitting at Stage 3, adjust $(c_2^{\mathrm{fit}}, a_2^{\mathrm{fit}})$ to trace the gentler long-range climb. The Stage-3 wSSE drops further. Toggle "Reveal truth" to compare your three-stage fit to the generating curve. How close did you get? The "% variance explained" readout for each stage tells you the relative contribution of each component to the total fit quality.
In multi-range-fitting, deliberately set $c_2^{\mathrm{fit}} = 0$ at Stage 3 (turn off the long-range component). The Stage-3 wSSE no longer improves on Stage 2 — without the second structured component, there is no further fit improvement available. This is the empirical justification for nesting: each component adds explanatory power that is unavailable to the simpler model.
Without coding: a reservoir-characterisation team reports a nested variogram fit "nugget 0.04, Spherical short-range c₁=0.18 at a₁=12 m, Spherical long-range c₂=0.32 at a₂=180 m". Compute the total sill, the variance partition across the three terms, and the scale gap $a_2/a_1$ . Classify the model: nugget-heavy or signal-heavy? Short-range or long-range dominated? Is the scale gap above or below the resolvability threshold?
Without coding: a mining-grade variogram shows a clear empirical dip — γ̂ rises to a peak near $h = 25$ m, dips to a local minimum near $h = 50$ m, and rises again toward a sill near $h = 100$ m. Diagnose. Is this a candidate hole-effect model? What physical process could produce a cycle at $\sim 50$ m? Would you fit the cosine-decay form, or look for a confounder (sampling-design artefact, mis-specified bin centres) before committing to a hole-effect interpretation?

Pause and reflect: nested structures combine multiple permissible variograms into a single multi-scale model. The total sill is the sum of per-component contributions, and the data must support the scale separation for the components to be resolvable. What would you most want to document in a defensible variogram report about a nested model — the per-component family choices, the scale ratios, the variance partitions, the cross-validation evidence — so that a downstream geomodeller can verify the model was justified by the data rather than over-fit to noise?

What you now know — and what Part 4 will build on it

You can state the sum-of-permissibles theorem: if $\gamma_1$ and $\gamma_2$ are permissible (conditionally negative definite), so is $\gamma_1 + \gamma_2$ . You can write the general nested model $\gamma(h) = c_0 \cdot \mathbb{1}[h>0] + \sum_{k=1}^K c_k , \rho_k(h; a_k)$ and compute its total sill as the sum of per-component contributions. You know that subtraction is NOT permissibility-preserving and that "fitting by drawing whatever curve looks reasonable" almost always violates CND somewhere.

You can build the two-structure-plus-nugget model — the practitioner workhorse — from a pair of §4.1 families plus a nugget, and you can interpret each component's sill and range in terms of a physical process operating at that scale. You know the typical scale ratios (5–20× in mining, 10–100× in fluvial reservoirs) and the resolvability rule of thumb ( $a_2 / a_1 \ge 5$ for the components to be separable from binned empirical data). You can apply the sequential fitting strategy — nugget by extrapolation, then short-range to the steep near-origin rise, then long-range to the residual climb, then iterate — and you understand why each component's contribution to fit quality is visible in the per-stage SSE drop.

You can extend the nested model to the anisotropic case from §4.3: each component carries its own anisotropy ellipsoid, the per-component orientations are independent, and the sum is still permissible. You can describe zonal anisotropy as a nested component with very long range along non-zonal directions and short range along the zonal direction. You know that the hole-effect cosine-decay model exists for cyclic processes (bedded sediments) but is permissible only in 1-D in its undamped form and requires dampening for multi-dimensional use.

You can apply the honest caveats: stop at 2-3 components; per-component ranges are not resolvable below a scale ratio of about 3-5; the nugget is always one component; the short-range component dominates near-origin smoothness; cross-validation is the defensible model-selection criterion. You understand why baroque nested models with $K \ge 4$ usually over-fit and why simpler well-separated nested models generalise better.

Part 4 closes here with §4.5 — FITTING STRATEGIES — which develops the formal machinery: by-eye fitting (§4.4 strategy made rigorous), by weighted least squares (the field standard), by maximum likelihood and REML (the statistician's preference). §4.5 also covers cross-validation as the model-selection criterion that compares nested models of different complexity. Part 5 then plugs the fitted nested model into the kriging system: the (possibly anisotropic, possibly nested) γ(h) becomes the entries of the kriging matrix, with per-component anisotropy applied to each lag distance separately and the search-ellipse machinery from §4.3 selecting the local neighbourhood.

The §4.4 sum-of-permissibles rule is the lever. Real variograms have multi-scale structure. A single-family model can only ever capture one scale. Nested models, built from sums of permissibles with carefully chosen scale gaps, capture the full multi-scale story while preserving the permissibility constraint that makes kriging's guarantees hold. Get the nesting right and the kriged map respects every scale of the field's variability simultaneously.

References

Matheron, G. (1971). The Theory of Regionalized Variables and Its Applications. Les Cahiers du Centre de Morphologie Mathématique, No. 5. Fontainebleau, France: École Nationale Supérieure des Mines de Paris. (The foundational reference. Establishes the permissibility constraint for the variogram and the sum-preserving property under which nested structures are themselves permissible. Lays out the §II.B framework for nested-component decomposition that subsequent textbooks elaborate.)
Journel, A.G., Huijbregts, C.J. (1978). Mining Geostatistics. Academic Press. (§II.A.4 is the canonical practitioner reference for nested-structure modelling. Documents the two-structure-plus-nugget workhorse form for South African gold and other base-metal deposits, the per-component physical interpretations, and the by-eye fitting strategy that subsequent textbooks formalise. The book that established nested structures as the default form in mining-geostatistics practice.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§4.2 develops the nested-structure machinery for environmental and natural-resources applications. §4.3 covers zonal anisotropy as a nested component with very long range along the non-zonal directions. §6.5 develops cross-validation as the model-selection criterion. The standard practitioner-oriented reference for the §4.4 material at a graduate-school level.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (§2.5 covers nested-structure modelling and the sum-of-permissibles theorem at a mathematical-statistics level of rigour. The hole-effect cosine-decay model and its dimension-dependent permissibility are developed in this section. The cleanest derivation of why CND is preserved under variogram sums.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (§2.5 develops the permissibility-preserving operations on variograms, including the sum theorem and the catalogue of constructions that preserve CND. The cleanest mathematical-statistics treatment of why nested structures stay permissible.)
Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide (2nd ed.). Oxford University Press. (§III.4 documents how nested variogram models are encoded and consumed in GSLIB. Each nested component is specified separately (family, sill, range, optional per-component anisotropy) and summed inside the kriging routines. The canonical software-side reference; reading it is the way to verify that a nested-model fit can be re-implemented identically in another package.)
Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling (2nd ed.). Oxford University Press. (§5 catalogues nested-structure parameters across reservoir analogue datasets — fluvial, deltaic, carbonate. Documents the typical scale ratios (10-100× horizontal-to-vertical, 5-20× across structured components) and the per-component-anisotropy practice for capturing distinct depositional processes within a single reservoir model.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 16 introduces nested-structure modelling at an entry level. The two-structure-plus-nugget Spherical example is the canonical first nested model that subsequent textbooks generalise. The book's pedagogy on by-eye fitting strategy informs the multi-range-fitting widget design.)
Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications (3rd ed.). Springer. (§7.4 catalogues additional permissible variogram families including damped hole-effect forms. §7.5 develops zonal anisotropy as a nested-structure construction and connects it to the broader linear-coregionalisation framework that the §4.4 machinery generalises to multivariate variograms.)