Permissible model families (spherical, exponential, Gaussian)

Part 4 — Variogram modeling

Learning objectives

State the PERMISSIBILITY requirement: a variogram model is permissible if and only if it is CONDITIONALLY NEGATIVE DEFINITE (CND), equivalently if the matching covariance $C(h) = C(0) - \gamma(h)$ is POSITIVE SEMI-DEFINITE (PSD). The CND/PSD constraint is what guarantees the kriging variance $\sigma_K^2 \ge 0$ for every linear estimator — fit any old curve to $\hat\gamma$ and the kriging system can return negative variances (physically impossible)
Write the SPHERICAL family $\gamma_{\mathrm{Sph}}(h) = c \cdot [1.5\,(h/a) - 0.5\,(h/a)^3]$ for $h \le a$ , else $c$ . Reaches the sill EXACTLY at $h = a$ ; $a$ is a TRUE range. Linear at the origin ( $\gamma \sim 1.5\,c\,(h/a)$ ) — corresponding process is continuous but not differentiable. Permissible in 1D, 2D, 3D (Matheron 1971)
Write the EXPONENTIAL family $\gamma_{\mathrm{Exp}}(h) = c \cdot [1 - \exp(-h/a)]$ . Approaches the sill ASYMPTOTICALLY. PRACTICAL RANGE $3a$ — the lag at which $\gamma \approx 0.95\,c$ (Journel & Huijbregts 1978 convention). Linear at the origin ( $\gamma \sim c\,h/a$ ) — process continuous but not differentiable, ROUGHER than spherical at the same nominal range
Write the GAUSSIAN family $\gamma_{\mathrm{Gau}}(h) = c \cdot [1 - \exp(-(h/a)^2)]$ . Approaches the sill ASYMPTOTICALLY. PRACTICAL RANGE $a\sqrt{3}$ — the lag at which $\gamma \approx 0.95\,c$ . PARABOLIC at the origin ( $\gamma \sim c\,(h/a)^2$ ) — process is DIFFERENTIABLE, very smooth. Barely permissible; numerically dangerous in pure form (Wackernagel 2003 §7.3, Chilès & Delfiner 2012 §2.5)
Read NEAR-ORIGIN BEHAVIOUR as process smoothness: $\gamma(h) \sim h$ (linear) ⇒ continuous-not-differentiable field (Spherical, Exponential); $\gamma(h) \sim h^2$ (parabolic) ⇒ differentiable field (Gaussian). The shape of $\gamma$ at the origin IS the smoothness of $Z(s)$ . This is THE single most important diagnostic for choosing a model family
Pick a family by matching observed Z behaviour: drillhole grades, soil contamination, porosity in a single facies — Exponential (continuous, slightly rough). Mining grade with a finite-range structure — Spherical. Highly smooth fields (rare; some seismic-derived attributes; bathymetry) — Gaussian, but always with caveats and always with a nugget. Default to Spherical or Exponential; reach for Gaussian only with a physical justification
Recognise NESTED STRUCTURES as a permissibility-preserving operation: the SUM of permissible variograms is permissible (§4.4 develops this). Common nested form: nugget $c_0$ + spherical at short range $a_1$ + spherical or exponential at long range $a_2 \gg a_1$ . Subtracting permissible variograms is NOT generally permissible — that's a frequent novice mistake when fitting by eye
Treat the GAUSSIAN model honestly: it is BARELY permissible in $\mathbb{R}^d$ for $d \ge 1$ (its covariance is PSD by construction) but the kriging matrices it produces are extremely ill-conditioned, kriging weights oscillate wildly, and tiny perturbations to the data produce large changes in the kriged map. Standard practice: never use a pure Gaussian model — always pair it with a nugget $c_0 > 0$ to regularise, or prefer Spherical / Exponential / Matérn outright
Know the POWER MODEL $\gamma_{\mathrm{Pow}}(h) = c \cdot h^\omega$ for $0 < \omega < 2$ : UNBOUNDED — no sill, only intrinsic stationarity (not second-order). Permissible for the same parameter range, but used mainly for fractal / self-similar processes that have no characteristic scale. Uncommon in routine geostatistics; reach for it only if $\hat\gamma(h)$ grows monotonically over the full data range with no sign of leveling off
Locate §4.1 honestly in Part 4: this section opens the modelling chapter. §4.2 reintroduces the nugget and fits it. §4.3 develops the anisotropic ellipsoid that wraps these isotropic families. §4.4 builds nested structures. §4.5 covers fitting strategies. Part 5 plugs the fitted model into kriging — and a wrong family choice in §4.1 propagates silently into bad kriged maps no matter how clean the rest of the workflow is

Part 3 produced an EMPIRICAL variogram $\hat\gamma(h)$ — six sections of careful work to give you the most defensible binned estimate of $\gamma$ that any reasonable dataset can support. §3.1 defined what $\gamma(h)$ is theoretically; §3.2 binned it from data; §3.3 added directional variants; §3.4 lifted isotropy; §3.5 indicator-transformed it for categorical analysis; §3.6 made it outlier-resistant. By the end of Part 3 you had a binned set of points ${(h_k, \hat\gamma(h_k)) : k = 1, \ldots, n_{\mathrm{bins}}}$ on a finite grid of lag distances, possibly with directional and categorical variants — but a SET OF POINTS, not a smooth continuous function. Part 4 picks up here. Part 4's job is to fit a continuous, smooth, PARAMETRIC model $\gamma(h; \boldsymbol\theta)$ to those binned points so the result can be evaluated at any lag $h$ — exactly what the kriging system in Part 5 needs.

The §4.1 question is: which functional forms are allowed? You cannot just spline-fit any curve to $\hat\gamma$ . The variogram must satisfy a precise mathematical condition called conditional negative definiteness (CND), and only a small canonical set of functional forms — the so-called permissible families — is known to satisfy it for general dimensions. Pick a permissible family and you are guaranteed that the kriging matrices in Part 5 will be positive definite, the kriging variance will be non-negative, and the predictor will be the BLUP (Best Linear Unbiased Predictor). Pick a non-permissible curve and the kriging system can return NEGATIVE variances at some target locations — physically impossible, and a silent diagnostic that something has gone wrong upstream.

This section introduces the three canonical isotropic families — Spherical, Exponential, and Gaussian — that account for the vast majority of variogram fits in published earth-science geostatistics. It explains the permissibility constraint, the formulae and parameters of each family, the near-origin smoothness diagnostic that picks among them, the unbounded power-model exception, and the rule that sums of permissibles stay permissible (the preview to nested structures in §4.4). It closes with an honest treatment of the Gaussian model's practical pitfalls. The first widget puts all three families on one γ(h) axis; the second draws Cholesky realisations from each family so you can SEE that the shape of $\gamma$ at the origin determines the SMOOTHNESS of $Z(s)$ . By the end you will be able to look at an empirical variogram and propose a defensible family.

Why "permissible"? Conditional negative definiteness in plain English

The variogram models we will fit are not arbitrary curves. The fundamental requirement is:

For any finite set of sample locations $\mathbf{s}_1, \ldots, \mathbf{s}_n$ and any real weights $\lambda_1, \ldots, \lambda_n$ with $\sum_i \lambda_i = 0$ :

\sum_{i=1}^{n} \sum_{j=1}^{n} \lambda_i \lambda_j \, \gamma(\|\mathbf{s}_i - \mathbf{s}_j\|) \;\le\; 0.

This is called conditional negative definiteness (CND): the double sum is non-positive for every weight vector whose components sum to zero. Equivalently, under second-order stationarity where $C(h) = C(0) - \gamma(h)$ , the matching covariance is POSITIVE SEMI-DEFINITE (PSD): $\sum_i \sum_j \lambda_i \lambda_j , C(|\mathbf{s}_i - \mathbf{s}_j|) \ge 0$ for every real weight vector (Cressie 1993 §2.5; Christakos 1984).

Why does this matter? The Part 5 kriging variance at a target location $\mathbf{s}_0$ from a linear combination $Z^*(\mathbf{s}_0) = \sum_i \lambda_i Z(\mathbf{s}_i)$ with $\sum_i \lambda_i = 1$ (the unbiasedness constraint) works out to

\sigma_K^2(\mathbf{s}_0) \;=\; 2 \sum_{i} \lambda_i \gamma(\|\mathbf{s}_i - \mathbf{s}_0\|) \;-\; \sum_{i, j} \lambda_i \lambda_j \, \gamma(\|\mathbf{s}_i - \mathbf{s}_j\|).

The second double sum is exactly the CND form (with weights $\lambda_i$ rearranged via the unbiasedness constraint to sum to zero in the relevant inner expression). If $\gamma$ is CND, that double sum is $\le 0$ and the kriging variance is bounded below by $2 \sum_i \lambda_i \gamma(|\mathbf{s}_i - \mathbf{s}_0|)$ — non-negative under any reasonable variogram. If $\gamma$ is NOT CND, that bound fails: there exist sample configurations at which $\sigma_K^2 < 0$ . A negative variance is physically meaningless, and an algorithm that produces one has broken its own guarantees silently — the kriged map looks normal, but the uncertainty quantification is mathematically incoherent.

The permissibility constraint is therefore not a mathematical nicety; it is the foundation that makes kriging's "B.L.U.P." properties hold. Christakos 1984 lays out the full mathematical machinery; Cressie 1993 §2.5 and Chilès & Delfiner 2012 §2.4 give the geostatistical-practitioner-oriented treatment. In practice, "permissible" reduces to "pick a curve from the canonical list, or build one by adding members of the list together". The canonical list for isotropic 2D and 3D geostatistics is short and what the rest of §4.1 develops.

The three canonical isotropic families

All three families have two parameters: a range $a > 0$ (a characteristic spatial scale) and a sill $c > 0$ (the asymptotic value $\gamma$ approaches at large lag). Below they are written without a nugget $c_0$ ; §4.2 puts the nugget back in. The formulae operate on the SCALAR lag $h = |\mathbf{h}| \ge 0$ — generalising to anisotropic 2D / 3D is the job of §4.3 via the search-ellipsoid transform. For now treat $h$ as one-dimensional.

Spherical

\gamma_{\mathrm{Sph}}(h) \;=\; \begin{cases} c \cdot \left[\dfrac{3}{2}\dfrac{h}{a} - \dfrac{1}{2}\left(\dfrac{h}{a}\right)^3\right] & 0 \le h \le a \\ c & h \ge a \end{cases}

The spherical model is the only one of the three with a TRUE range: $\gamma_{\mathrm{Sph}}(h) = c$ exactly for every $h \ge a$ . Pairs farther apart than $a$ are uncorrelated by construction. Near the origin $\gamma_{\mathrm{Sph}}(h) \approx 1.5,c,(h/a)$ (the linear term dominates; the cubic term contributes only at higher $h$ ), so the spherical variogram rises with a finite linear slope at $h = 0$ . The corresponding process $Z(\mathbf{s})$ is continuous but not differentiable — typical of mining grades, sediment thicknesses, and other geological fields with a finite "patch size" beyond which correlation effectively vanishes.

The spherical model is permissible in $\mathbb{R}^d$ for $d \le 3$ (Matheron 1971; the cubic polynomial form is the unique permissible variogram with compact support on a ball of radius $a$ in 3D). It is the historical workhorse of mining geostatistics — Krige and Matheron used it on South African gold deposits in the 1960s, and it has remained the default for most variographies in mineral resources, hydrogeology, and reservoir characterisation. If you have no prior reason to favour a different family, Spherical is the conservative starting point.

Exponential

\gamma_{\mathrm{Exp}}(h) \;=\; c \cdot \left[1 - \exp\!\left(-\dfrac{h}{a}\right)\right]

The exponential model has no compact support: $\gamma_{\mathrm{Exp}}(h) < c$ for every finite $h$ . The sill is approached ASYMPTOTICALLY, never reached exactly. The conventional practical range is $3a$ — the lag at which $\gamma_{\mathrm{Exp}} = c \cdot (1 - e^{-3}) \approx 0.95,c$ — and the parameter $a$ itself is the "1/e range" (where $\gamma \approx 0.63,c$ ). Practitioners typically QUOTE the practical range $3a$ when reporting an exponential fit; some software reports the parameter $a$ directly. ALWAYS check which convention your software uses; misreading by a factor of 3 is a common bug (Goovaerts 1997 §4.2 documents the convention pitfall).

Near the origin $\gamma_{\mathrm{Exp}}(h) \approx c,(h/a)$ (linear), so the exponential process is also continuous-but-not-differentiable. Compared with the spherical model at the same effective range, the exponential's near-origin tangent is STEEPER (slope $c/a$ versus $1.5,c/a$ for spherical with $a$ -the-true-range — the comparison depends on whether you match $a$ or the practical range). The exponential process is therefore visibly ROUGHER than the spherical at small lags. Drillhole grades, soil contamination above a background level, and many geochemical fields fit exponentially better than spherically because the underlying physical noise has no characteristic decorrelation distance — the correlation simply decays continuously with separation.

The exponential model is permissible in $\mathbb{R}^d$ for all $d \ge 1$ (its covariance $C(h) = c , e^{-h/a}$ is a Markovian covariance and is PSD by direct check via Bochner's theorem — Stein 1999 §2.7).

Gaussian

\gamma_{\mathrm{Gau}}(h) \;=\; c \cdot \left[1 - \exp\!\left(-\dfrac{h^2}{a^2}\right)\right]

The Gaussian model has the same asymptotic-approach structure as the exponential — never reaches $c$ exactly — but the exponent is QUADRATIC in $h$ . The conventional practical range is $a\sqrt{3}$ (where $\gamma_{\mathrm{Gau}} = c,(1 - e^{-3}) \approx 0.95,c$ ). At $h = a$ the variogram is at $c,(1 - e^{-1}) \approx 0.63,c$ , just like the exponential at $h = a$ — but the WAY it gets there is fundamentally different.

Near the origin $\gamma_{\mathrm{Gau}}(h) \approx c,(h/a)^2$ . The tangent at $h = 0$ is ZERO; the curve curls upward parabolically. This near-origin behaviour is the most consequential feature of any variogram family — the Gaussian model corresponds to a DIFFERENTIABLE underlying process $Z(\mathbf{s})$ . Realisations from a Gaussian variogram look painterly-smooth: locally, the field has well-defined local gradients, much like a smoothly-varying mathematical function rather than a rough physical signal.

Earth-science fields that match the Gaussian model are RARE. Drillhole grades, soil contamination, porosity, water-table elevation, seismic-derived attributes, geochemical concentrations — all of these are typically rough enough that the linear-origin Exponential or finite-range Spherical describes them better. Cases where Gaussian wins are unusual: a few very smooth seismic attributes (after band-pass filtering), some kinds of mean-vegetation-index data, occasional bathymetric or topographic continuity studies. The honest default is "do not use Gaussian unless you have a specific physical reason and have paired it with a nontrivial nugget". We unpack the why below.

The Gaussian model is permissible in $\mathbb{R}^d$ for all $d \ge 1$ (its covariance is the Gaussian kernel, which is PSD via Bochner's theorem applied to a positive measure on $\mathbb{R}^d$ ). It is, however, barely so — the next subsection unpacks the practical pitfall.

Compare the three on one axis

The first widget for §4.1. All three families on a common $\gamma(h)$ axis, sharing the same parameter $a$ and sill $c$ . Toggle each family on or off; slide $a$ and $c$ ; toggle the practical-range markers (where $\gamma$ reaches $c$ for Spherical, $0.95,c$ for the asymptotic families).

Three observations to make. First, look at the ORIGIN. Spherical and Exponential rise with a finite linear tangent — the slope at $h = 0$ is $1.5,c/a$ for Spherical and $c/a$ for Exponential. Gaussian rises with a ZERO tangent and curls upward parabolically. The eye picks up the difference immediately when the three curves are superimposed: Gaussian "hugs" the horizontal axis near $h = 0$ before lifting off, while the other two leave the axis with visible slope. THIS is the most important diagnostic of §4.1 — the shape at the origin tells you which family the process belongs to.

Second, look at how each curve APPROACHES THE SILL. Spherical has a clean kink at $h = a$ : the cubic-polynomial form crosses $c$ with a discontinuous second derivative there, then stays flat. Exponential and Gaussian both glide asymptotically — Exponential reaches 63% of the sill at $h = a$ and 95% at $h = 3a$ ; Gaussian reaches 63% at $h = a$ but 95% only at $h = a\sqrt{3} \approx 1.73,a$ . So even at the SAME parameter $a$ , the three families have very different effective decorrelation distances. When practitioners report "range = 100 m" without specifying the family or the convention (true-range vs 1/e range vs 95%-of-sill range), the number is ambiguous by a factor of 1–3. Always document which convention you are using.

Third, use the marker toggle. With practical-range markers on, each family's "the curve has effectively reached the sill" point sits on the curve at a different $h$ . For the same nominal parameter $a$ : Spherical's marker sits at $a$ , Gaussian's at $a\sqrt{3}$ , Exponential's at $3a$ . If your empirical variogram looks like it levels off around $h = 100$ m, that single observation is consistent with $a = 100$ m (Spherical), $a \approx 58$ m (Gaussian, via $100/\sqrt{3}$ ), or $a \approx 33$ m (Exponential, via $100/3$ ). The family choice IS the range interpretation.

Near-origin behaviour IS process smoothness

Spend more time on this because it is the single most consequential idea in §4.1. The shape of $\gamma(h)$ at $h = 0$ controls the smoothness of the corresponding random field $Z(\mathbf{s})$ .

Spherical: $\gamma(h) \sim \dfrac{3c}{2a},h$ as $h \to 0$ . Linear. Process is CONTINUOUS but NOT DIFFERENTIABLE. Realisations have sharp boundaries between high-value and low-value patches; you can interpolate sample-to-sample smoothly inside a patch, but there is no well-defined local gradient at a generic point.
Exponential: $\gamma(h) \sim \dfrac{c}{a},h$ as $h \to 0$ . Linear (slope smaller than Spherical for the same $a$ , but matched for the same practical range). Process is CONTINUOUS but NOT DIFFERENTIABLE — and ROUGHER than the spherical analog. Realisations look grainy at every scale; the field has fractal-like irregularity.
Gaussian: $\gamma(h) \sim \dfrac{c}{a^2},h^2$ as $h \to 0$ . Parabolic — slope at $h = 0$ is ZERO. Process is DIFFERENTIABLE. Realisations look painterly-smooth, with well-defined local gradients. In the limit, a Gaussian random field is infinitely differentiable (mean-square differentiable for all orders).

This is not a heuristic; it is a theorem. The smoothness of a stationary random field is precisely controlled by the behaviour of its variogram (equivalently, its covariance) at the origin. Stein 1999 §2.7 gives the formal statement: a stationary process $Z(\mathbf{s})$ has $k$ mean-square derivatives if and only if $\gamma(h)$ is $2k$ -times differentiable at $h = 0$ . Linear-at-origin variograms (Spherical, Exponential) correspond to non-differentiable fields ( $k = 0$ ); parabolic-at-origin variograms (Gaussian) correspond to fields with at least one mean-square derivative ( $k \ge 1$ ), and Gaussian specifically gives mean-square infinity differentiability.

The practical takeaway: look at your data, decide whether the process is smooth or rough, and let that decision pick the family. Drillhole grades sampled at 5 m intervals are clearly rough — pick a linear-origin model. Smoothed seismic-attribute maps interpolated by the seismic processor before you got them are smoother — pick a Gaussian model, but understand that some of that smoothness is upstream processing, not real geology. Soil contamination above a background level is rough — Exponential. Bathymetric continuity in deep ocean is smooth — Gaussian. The diagnostic is the field, not the empirical variogram alone (because $\hat\gamma$ at small lags is the noisiest part of the binned estimate, so the origin behaviour is the hardest part of the empirical curve to read confidently — §4.2 and §4.5 develop the workflow).

The variogram model IS the field's smoothness

The second widget for §4.1. Three 28×28 unconditional realisations from the SAME shared white-noise vector, drawn from the Spherical / Exponential / Gaussian covariances at the same range and sill. The reader sees the smoothness contrast directly.

Each panel runs the same recipe: build the $784 \times 784$ covariance matrix $\Sigma_{ij} = c \cdot \rho(|\mathbf{s}_i - \mathbf{s}_j|) = c - \gamma(|\mathbf{s}_i - \mathbf{s}_j|)$ on the $28 \times 28$ grid, Cholesky-factorise it $\Sigma = L L^\top$ , and form $\mathbf{Z} = L \mathbf{z}$ where $\mathbf{z}$ is a shared standard-normal white-noise vector. The same $\mathbf{z}$ is used for all three panels — the only thing that changes between them is which $\gamma$ goes into $\Sigma$ . So the comparison strictly isolates the effect of the MODEL FAMILY on the realisation's appearance.

Three things to do with this widget. Set range to a moderate value (say $a = 0.25$ ) and click "New realisation" three or four times. With every reseed, the three panels change values but their RELATIVE TEXTURES stay the same: Spherical is grainy with discernible patch boundaries; Exponential is grainier than Spherical at every scale; Gaussian is smooth — almost like a low-pass-filtered version of the same underlying randomness. This is what "variogram model = process smoothness" means at the pixel level.

Slide the range up. As $a$ grows, all three panels develop larger blobs — but the relative-smoothness ranking does not change. Even at $a = 0.5$ where each panel is essentially a single blob, the Gaussian panel still looks visibly smoother than the other two, because the smoothness ordering is set by the family, not the range. Slide the sill: only the colour-contrast amplitude changes (the dynamic range of the field), not the texture. The two parameters $a$ and $c$ control SCALE and AMPLITUDE; the family controls TEXTURE.

Now go back to the field on your desk and look at the texture. If a realisation from the right widget panel "looks like the data", that family is a candidate. If the data looks grainy at every scale — Exponential. If it has visible finite-range patches — Spherical. If it looks painterly-smooth — Gaussian, but think hard about whether that smoothness is real geology or upstream processing.

Honest caveats: the Gaussian model is dangerous

The Gaussian model is permissible — its covariance is PSD by direct check via Bochner's theorem on the Gaussian kernel. But "permissible" is not the same as "well-behaved in finite-precision arithmetic", and the Gaussian model fails the second test badly enough that field practice rarely uses it in pure form. Two issues are worth knowing.

Ill-conditioning. The kriging matrix $\mathbf{K}_{ij} = C(|\mathbf{s}_i - \mathbf{s}_j|)$ built from a pure Gaussian covariance is extremely ill-conditioned when sample locations are close together: at small $h$ the covariance is $C(h) \approx c \cdot (1 - h^2/a^2)$ — nearly equal to $c$ for any small $h$ , so rows of $\mathbf{K}$ for nearby samples are nearly linearly dependent. The condition number of $\mathbf{K}$ grows exponentially with the smallest pair distance, and kriging weights $\boldsymbol\lambda = \mathbf{K}^{-1} \mathbf{k}$ become extremely sensitive to tiny perturbations of the data. Tiny rounding noise produces large oscillations in the kriged map; close-together duplicate samples can produce weights with absolute value $\gg 1$ (one weight is +50, the next is -50, and they nearly cancel). This is documented in Chilès & Delfiner 2012 §3.4 and Wackernagel 2003 §7.3 as the principal practical failure mode of the Gaussian model.

The standard fix is a non-zero nugget. Adding $c_0 > 0$ to the diagonal of $\mathbf{K}$ regularises the system — it is mathematically the same as Tikhonov regularisation. A nugget of even $c_0 = 0.01,c$ (1% of the structural sill) is usually enough to make the Gaussian model usable. So if you have a real reason to fit Gaussian (very smooth field, physical model that requires differentiability), pair it with a small nugget always. §4.2 picks up the nugget story; §4.4 picks up the nested-structures story; §4.5 picks up the fitting story.

The pragmatic recommendation, which most modern textbooks endorse (Goovaerts 1997 §4.2, Chilès & Delfiner 2012 §2.5, Cressie 1993 §2.5): default to Spherical or Exponential; reach for Gaussian only when you have explicit physical justification AND you pair it with a nugget. Some practitioners go further and avoid Gaussian altogether, preferring the Matérn family (a one-parameter generalisation that interpolates between Exponential and Gaussian smoothness — covered briefly in §4.4 and developed fully in Stein 1999). The Matérn is the modern preference where infrastructure permits, because it lets you fit the smoothness parameter $\nu$ rather than committing to a binary "rough or differentiable" choice.

The power model: unbounded variograms for fractal processes

One more family deserves mention, mainly as a contrast to the three above. The power model is

\gamma_{\mathrm{Pow}}(h) \;=\; c \cdot h^\omega, \quad 0 < \omega < 2.

It has NO SILL. As $h$ grows, $\gamma_{\mathrm{Pow}}$ grows without bound — the field has no finite total variance and no characteristic decorrelation distance. The power model is permissible for $\omega \in (0, 2)$ exclusive — at $\omega = 2$ the corresponding "process" reduces to a globally linear trend (not stationary in any sense); for $\omega \ge 2$ the form is not CND. The boundary cases: $\omega = 1$ recovers a linear variogram (Brownian-motion-like, the canonical fractal variogram in 1D); $\omega = 2 - \epsilon$ for small $\epsilon$ approaches near-parabolic behaviour at the origin (smooth fractal).

When would you reach for the power model? When the empirical $\hat\gamma(h)$ grows monotonically over the full available lag range with NO sign of leveling off. This is the signature of a fractal / self-similar process — one that has structure at every scale, with no characteristic spatial extent. Examples in earth science are real but uncommon: fractured-rock permeability over many orders of magnitude (very heterogeneous fields with no characteristic patch size), some kinds of topographic relief over long ranges, certain atmospheric-trace-gas fields. The power model is, however, formally only INTRINSICALLY stationary (not second-order stationary): the field has no finite variance, so $C(h)$ does not exist as a finite-valued function — only $\gamma(h)$ does. The Part 5 kriging system can still be assembled (it only needs $\gamma$ values), but uncertainty quantification (Part 6) requires careful handling.

For routine geostatistics — mining grades, porosity / permeability in a single facies, contamination above background — the empirical variogram almost always shows a sill, and the power model is not the right choice. Keep it in the toolbox for the rare fractal case and reach for it when $\hat\gamma$ tells you the field has no characteristic scale.

Combining permissible models — preview of §4.4 nested structures

The permissibility constraint is preserved under a few simple operations on variogram models — and violated under others. The single most useful rule:

The SUM of two permissible variograms is permissible.

If $\gamma_1(h)$ and $\gamma_2(h)$ are both CND, then so is their sum $\gamma_1(h) + \gamma_2(h)$ . This follows directly from the linearity of the double-sum CND inequality. It is the foundation of nested structures: practitioners almost never fit a single canonical family to real data; they fit a SUM of two or three. A typical nested form is

\gamma(h) \;=\; \underbrace{c_0 \cdot \mathbb{1}[h > 0]}_{\text{nugget}} \;+\; \underbrace{c_1 \cdot \gamma_{\mathrm{Sph}}(h; a_1)}_{\text{short-range structure}} \;+\; \underbrace{c_2 \cdot \gamma_{\mathrm{Sph}}(h; a_2)}_{\text{long-range structure}}

with $a_2 \gg a_1$ . Each term contributes its own sill (so the total sill is $c_0 + c_1 + c_2$ ); the empirical variogram looks like the sum of three rising steps. Real data routinely has nested structure: a short-range component from local lithological variability, a long-range component from regional-scale drift or facies change, and a nugget capturing measurement error and unresolvable sub-sample-spacing variability. §4.4 develops the full nested-structures machinery.

Other CND-preserving operations: multiplying a permissible variogram by a positive constant; product of two permissible covariances (in some specific cases — see Christakos 1984 for the formal conditions). The operations that ARE NOT generally permissible: subtraction (the difference of two permissibles can violate CND), arbitrary nonlinear transforms (e.g., squaring $\gamma$ is not generally permissible). The novice mistake is to "fit by eye" by drawing a curve that locally looks reasonable but that is not the sum of canonical families — almost any such curve violates CND somewhere on its domain.

How to pick a family in practice

The §4.5 fitting workflow develops this systematically, but here is the §4.1 short version. Given a binned $\hat\gamma(h)$ from Part 3:

Look at the near-origin behaviour. Does $\hat\gamma$ rise with a finite linear slope at small $h$ , or does it stay near zero and curl up later? Linear-at-origin ⇒ Spherical or Exponential (continuous-but-rough field). Parabolic-at-origin ⇒ Gaussian (differentiable field). This is the most diagnostic property — but also the noisiest part of the empirical curve, so do not over-interpret a single bin's value at $h_1 = \Delta$ .
Look at how the curve approaches the sill. A clean kink to a flat sill at finite $h \approx a$ suggests Spherical. An asymptotic approach without a clear corner suggests Exponential or Gaussian. The location of the practical range tells you which: if $\hat\gamma \approx 0.95,c$ at $h$ where the curve is just leveling off, then Spherical has $a \approx h$ , Gaussian has $a \approx h/\sqrt{3}$ , Exponential has $a \approx h/3$ .
Check for an unbounded tail. If $\hat\gamma$ keeps growing without a clear sill across the full data range, consider the power model — but first check that you haven't simply hit $h_{\max}$ short of where the sill would have appeared (the §3.2 half-domain rule). If the data has a real sill out there beyond the empirical reach, the bounded families are still right; if the data is fractal, the power model is.
Default to Spherical or Exponential. Use Gaussian only with a specific physical justification (smooth process) AND a nugget. Use the power model only for fractal cases. Use nested structures (§4.4) when one family alone cannot match both the short-lag rise and the long-lag tail.
Sanity-check against the field, not just the empirical variogram. Look at the data on a map (or in a borehole strip plot). Does the field look rough or smooth? Does it have patches with clean boundaries (Spherical) or grainy continuity (Exponential)? The empirical $\hat\gamma$ is one diagnostic; the field is another. They should agree; if they disagree the empirical variogram is probably misleading you (small $N(h)$ bins, contamination, non-stationarity).

Connection to kriging — and why the family choice cascades

The fitted variogram model $\gamma(h; \boldsymbol\theta)$ is the INPUT to the Part 5 kriging system. Every entry of the kriging matrix and the right-hand-side vector is a value of $\gamma$ evaluated at a specific lag. The KRIGING WEIGHTS — the $\lambda_i$ that produce the predicted value $Z^*(\mathbf{s}_0) = \sum_i \lambda_i Z(\mathbf{s}_i)$ — are computed from those values. Different families produce different weights for the SAME sample configuration:

Spherical: weights drop to zero outside the range. Samples farther than $a$ from the target get zero weight — they carry no information. The kriging neighbourhood is finite and well-defined.
Exponential: weights decay smoothly with distance. There is no hard cutoff; even samples far from the target get a tiny non-zero weight. The kriging neighbourhood is effectively finite (samples beyond $3a$ contribute less than 5%) but not exactly finite.
Gaussian: weights decay even more smoothly, but with the ill-conditioning caveat above. Weights near sample locations can become large and oscillatory — the source of the wildly-fluctuating kriged maps that the §2.5 critique of Gaussian models picks up on.

The Part 5.8 catalogue of modal kriging failure modes lists "wrong variogram family" as one of the top three. A field that is rough but fitted with a Gaussian model produces over-smoothed kriged maps that miss real variation. A field that is smooth but fitted with an Exponential model produces under-smoothed maps with artefacts near sample locations. A field with a real finite range but fitted with the power model produces unphysical long-range correlation. The family choice in §4.1 propagates forward into every downstream product — kriged map, kriging variance, simulated realisations (Part 7), reserve estimates (Part 8). Get it right.

Try it

In model-family-explorer, show all three families with $a = 0.30$ and $c = 1.0$ . Read $\gamma(h = 0.30)$ for each family directly off the plot. You should see Spherical = 1.0 (it has reached the sill exactly at $h = a$ ); Exponential ≈ $1 - e^{-1} \approx 0.63$ ; Gaussian ≈ $1 - e^{-1} \approx 0.63$ . The three families give VERY different values at the same lag and the same parameter — the family choice is not cosmetic.
Still in model-family-explorer, turn on the practical-range markers and read off each family's 95%-of-sill (or true-sill for Spherical) location. Spherical: $a = 0.30$ . Gaussian: $a\sqrt{3} \approx 0.52$ . Exponential: $3a = 0.90$ . The three "ranges" differ by a factor of 3 for the same nominal $a$ . When reading a published variogram, ALWAYS check which convention is used.
In model-family-explorer, hide Spherical and Exponential and look at Gaussian alone near the origin. The curve hugs the $\gamma = 0$ axis for the first $\sim 0.05 , a$ of lag, then curls upward. Now show Exponential alongside and look at the same near-origin region. Exponential leaves the axis with a visible slope. THIS is the diagnostic — Gaussian "knees up" parabolically, Exponential "lifts off" linearly.
In realization-from-model, set $a = 0.25$ , $c = 1.0$ , and click "New realisation" three times. After each reseed, compare the three panels. The texture differences are stable across seeds: Spherical and Exponential have visible grain; Gaussian is smooth. The smoothness is the FAMILY signature, not noise in a single realisation.
In realization-from-model, hold the seed and slide $a$ from 0.10 to 0.50. The blob size in each panel grows in lockstep — the range controls SCALE, not texture. Now slide $c$ from 0.4 to 1.4 (range fixed). Only the colour-contrast amplitude changes; the texture is unchanged. Sill controls AMPLITUDE.
In realization-from-model, pick one of the three panels (your choice) and imagine it is a sampled porosity field. Which family would you fit and why? Why might the OTHER two families produce visibly worse kriged maps for that hypothetical dataset?
Without coding: a colleague's empirical variogram on drillhole grades shows $\hat\gamma(h_1 = 5\text{ m}) = 0.4,c$ (close to the sill already at the first bin). Is this evidence of a SHORT-RANGE structure, of a LARGE NUGGET, or both? How would the three families compare in fitting this empirical shape, and what does the diagnostic tell you about the field?
Without coding: a published variogram fit reports "Gaussian, range = 200 m, sill = 0.02, nugget = 0". A reviewer flags the lack of a nugget. Why is the reviewer right? What is the practical risk of accepting this fit for downstream kriging, and what would the minimum acceptable nugget look like?

Pause and reflect: the same empirical $\hat\gamma(h)$ can be fitted by any of the three canonical families — Spherical, Exponential, Gaussian — with different parameters. The CHOICE of family is not made by the empirical variogram alone; it is made by combining the empirical variogram with a physical understanding of the field's SMOOTHNESS. What information about your field would you want to gather BEFORE looking at $\hat\gamma$ to make this choice defensible?

What you now know — and what Part 4 will build on it

You can state and explain the PERMISSIBILITY constraint: a variogram model is permissible iff it is conditionally negative definite, equivalently its covariance is positive semi-definite. You understand why this matters — without CND, the kriging system can return negative variances. You know that "permissible" reduces in practice to "pick a family from the canonical list, or sum members of the list".

You can write the three canonical isotropic families: Spherical (linear at origin, true range $a$ ), Exponential (linear at origin, practical range $3a$ ), Gaussian (parabolic at origin, practical range $a\sqrt{3}$ ). You understand that the near-origin behaviour of $\gamma(h)$ IS the smoothness of the underlying process: linear ⇒ continuous-but-not-differentiable; parabolic ⇒ differentiable. You have seen this directly in the realisation widget — same seed, same parameters, only the family changes, and the texture of the realisation changes.

You know the practical defaults: Spherical or Exponential for almost all earth-science fields; Gaussian only with physical justification and always paired with a nugget; the power model only for fractal cases. You have the rule that sums of permissibles are permissible (the preview of §4.4 nested structures), and you know the operations that DO NOT preserve permissibility (subtraction, arbitrary curve fits). You can recognise the Gaussian model's ill-conditioning pitfall and the standard nugget regularisation.

Part 4 continues here. §4.2 reintroduces the nugget effect — the discontinuity at $h = 0^+$ that combines measurement error with unresolvable sub-sample-spacing variability — and develops the field-standard practice of fitting it explicitly. §4.3 lifts the isotropy assumption by wrapping the canonical families in an anisotropic ELLIPSOID (range becomes a tensor, defining a "search ellipse" of statistically equivalent lag vectors). §4.4 develops NESTED STRUCTURES in full: how to sum two or three permissible variograms to fit a real empirical shape that no single family can match. §4.5 covers FITTING STRATEGIES — by eye, by weighted least squares (the field-standard), and by likelihood — and how to choose among them. By the end of Part 4 you have a defensible permissible variogram model $\gamma(h; \boldsymbol\theta)$ that Part 5 can plug into the kriging system.

The slow build pays off here too. The empirical variogram from Part 3 is binned and noisy; the fitted model from Part 4 is smooth and continuous. Kriging needs the latter. The family choice in §4.1 is the first commitment in the modelling chain. Make it carefully.

References

Matheron, G. (1971). The Theory of Regionalized Variables and Its Applications. Les Cahiers du Centre de Morphologie Mathématique, No. 5. Fontainebleau, France: École Nationale Supérieure des Mines de Paris. (The foundational reference. Defines the spherical, exponential, and Gaussian variogram families and proves the permissibility (conditional negative definiteness) constraint that the canonical list satisfies. The spherical model's compact-support property in 3D is established here.)
Christakos, G. (1984). On the problem of permissible covariance and variogram models. Water Resources Research, 20(2), 251–265. (The canonical formal-statistics reference for the permissibility constraint. Develops the conditional-negative-definiteness machinery, proves which families satisfy it in $\mathbb{R}^d$ for which $d$ , and catalogues the permissibility-preserving operations on variogram models — including the sum rule that justifies nested structures.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (Chapter 2 is the modern mathematical-statistics treatment of variogram models. §2.5 covers the canonical isotropic families and the permissibility constraint; the Gaussian model's ill-conditioning is documented as a practical caveat.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (Chapter 2 is the modern comprehensive practitioner reference. §2.4 develops permissibility; §2.5 catalogues the canonical families with their parameter conventions; §3.4 documents the Gaussian model's numerical pitfalls for kriging.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§4.2 covers variogram models in a practitioner-oriented format. The recommendation to default to Spherical or Exponential, and to use Gaussian only with caveats and nuggets, is documented in this textbook's style.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 7 introduces variogram models at an entry level. The 0.95-of-sill practical-range convention for Exponential and Gaussian is presented in the same form used by the §4.1 widgets.)
Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer. (The definitive mathematical-statistics treatment of variogram smoothness. §2.7 establishes the theorem connecting near-origin variogram behaviour to mean-square differentiability of the underlying random field, and develops the Matérn family as the canonical one-parameter family that interpolates between Exponential and Gaussian smoothness.)
Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications (3rd ed.). Springer. (§7.3 develops the variogram-modelling machinery, including the Gaussian model's ill-conditioning and the standard nugget-regularisation fix. The author was one of the developers of the modern multivariate-geostatistics framework that builds on top of these canonical families.)
Journel, A.G., Huijbregts, C.J. (1978). Mining Geostatistics. Academic Press. (The classic mining-geostatistics textbook. The 0.95-of-sill convention for the exponential practical range (3a) originates from the Fontainebleau tradition and is documented here in the practitioner form that GSLIB and downstream software inherit.)