Nugget effect and short-scale variability

Part 4 — Variogram modeling

Learning objectives

Define the NUGGET EFFECT: γ(h) does NOT smoothly go to zero at h = 0; instead γ(0⁺) = c₀ > 0, a JUMP DISCONTINUITY at the origin. The model is $\gamma(h) = c_0 \cdot \mathbb{1}[h > 0] + \gamma_{\mathrm{str}}(h)$ where γ_str is a permissible family (Spherical, Exponential, Gaussian, ...) with its OWN sill c — so the total sill is $c_0 + c$
Distinguish the TWO PHYSICAL SOURCES of the nugget: (1) MEASUREMENT ERROR — each sample value carries IID white-noise error $\varepsilon$ with variance $\sigma_\varepsilon^2 = c_0$ ; (2) SUB-SAMPLING-SCALE VARIABILITY — real spatial structure exists at scales below the minimum sampling interval. Both produce the same γ̂ shape; they are NOT distinguishable from the variogram alone
Diagnose source via EXPERIMENT: RESAMPLING at a finer spatial resolution shrinks the nugget if it was sub-sampling variability (revealing structure at the new scale) and leaves it unchanged if it was measurement error. INDEPENDENT REPLICATE SAMPLES at the same physical location directly estimate $\sigma_\varepsilon^2$
Apply the c₀/(c₀+c) RATIO as a structure diagnostic: < 0.1 well-structured; 0.3–0.5 moderate; > 0.7 noise-dominated; → 1 essentially IID (PURE NUGGET model)
Pure-nugget model $\gamma(h) = c_0 \cdot \mathbb{1}[h > 0]$ : γ jumps to c₀ at h = 0⁺ and stays flat. No spatial correlation at sampled scales; kriging reduces to the global MEAN. Usually means: sampling too coarse OR measurement noise dominates. NOT a defensible final model — a diagnostic that something upstream needs to change
Read the kriging implication: under the MEASUREMENT-ERROR interpretation the kriged surface SMOOTHS THROUGH samples (denoising); under the MICROSCALE-VARIABILITY interpretation the kriged surface PASSES THROUGH samples (faithful interpolation). Software defaults vary; the choice has to be made explicitly and propagates to every downstream product
Recognise SMALL-LAG BIAS: the EMPIRICAL γ̂ at the smallest lag is itself a noisy estimate, downward-biased under Cressie–Hawkins-style robustness corrections (Cressie 1993 §2.4). The MODELED c₀ — the intercept of the fitted curve at h = 0⁺ — is the right number to report and use downstream, not the value of the visible smallest-lag bin alone
Know that NOT ALL PROCESSES HAVE A NUGGET: very smooth Gaussian random fields, Brownian-motion-style processes after enough integration, and bandlimited seismic-derived attributes can have γ(0) = 0 with a true continuous approach to the origin. Reporting nugget = 0 honestly is appropriate when the field is smooth and the sampling is not noisy

§4.1 introduced the three canonical isotropic variogram families — Spherical, Exponential, Gaussian — all written with the assumption that $\gamma(0) = 0$ : as the lag $h$ shrinks to zero, the variogram smoothly returns to zero, and pairs of values measured at the same location are identical (perfectly correlated, with covariance $C(0)$ equal to the total variance). In a perfectly-resolved noise-free measurement campaign of a perfectly-smooth spatial process, that assumption holds. In any actual earth-science dataset, it does not.

What real empirical variograms show, in case after case, is a clean linear (or near-linear) climb from a positive intercept at the smallest lag — not from zero. Plot $\hat\gamma(h_1, h_2, \ldots)$ for any drillhole-grade campaign, soil-contamination survey, or porosity log, and the smallest-lag bin sits already at some positive value $c_0 \approx 0.05,c$ to $0.5,c$ above zero. Connect the bins with a straight edge back to $h = 0^+$ and you do NOT land at the origin: you land at $c_0$ . The variogram has a jump discontinuity at the origin: $\gamma(0) = 0$ by definition (the value at zero lag, the autocovariance with itself) but $\gamma(0^+) = c_0 > 0$ as soon as you separate by any positive amount.

This jump is the nugget effect. The name comes from gold mining: in the early 20th-century South African deposits where Krige and Matheron developed the first geostatistical methods, the variability between two assay samples taken from the same drillhole interval did not vanish even as the samples got closer together. Sometimes you assayed a piece of rock and it contained a small gold nugget; sometimes you didn't. The variability between adjacent samples remained finite — a "nugget effect" in the literal mineralogical sense, generalised over decades into the standard term for any positive $\gamma(0^+)$ . §4.2 takes the nugget seriously, separates its two physical sources, develops the fitting practice that handles it, and works through the consequences for downstream kriging.

The model: nugget + structured component

The variogram with nugget is written as a sum of two terms:

\gamma(h) \;=\; c_0 \cdot \mathbb{1}[h > 0] \;+\; \gamma_{\mathrm{str}}(h)

where $\gamma_{\mathrm{str}}(h)$ is a permissible family from §4.1 (Spherical, Exponential, Gaussian, or a power model) with its own sill $c$ and range $a$ . The indicator $\mathbb{1}[h > 0]$ encodes the jump: at $h = 0$ the indicator is zero, so $\gamma(0) = 0$ as required; for any $h > 0$ the indicator is 1 and $\gamma(h)$ inherits the constant $c_0$ contribution. The full variogram is

\gamma(h) \;=\; c_0 \;+\; c \cdot \rho^{\mathrm{(model)}}(h; a) \quad \text{for } h > 0

where $\rho^{\mathrm{(model)}}$ is the family-specific shape (e.g., $1.5(h/a) - 0.5(h/a)^3$ for Spherical with $h \le a$ , etc., from §4.1). The total sill — the value $\gamma(h)$ approaches at large $h$ — is

\text{total sill} \;=\; c_0 + c

where $c_0$ is the nugget contribution and $c$ is the structured-component sill. Both are positive. Both are unknown a priori. Both are fitted from the empirical variogram in §4.5.

The corresponding covariance is

C(h) \;=\; (c_0 + c) - \gamma(h) \;=\; \begin{cases} c_0 + c & h = 0 \\ c \cdot (1 - \rho^{\mathrm{(model)}}(h; a)/1) & h > 0 \end{cases}

so the covariance has its OWN jump at $h = 0$ : at zero lag, $C(0) = c_0 + c$ (total variance); at any positive lag, $C(h) \le c$ (no nugget contribution to spatial correlation). The nugget contributes to the total variance but NOT to the structured spatial correlation. This is the key idea — and the source of the two physical interpretations developed below.

Two physical sources of the nugget

The same nugget jump can arise from two fundamentally different physical processes — and the difference matters for what kriging should do.

1. Measurement error

The first source is measurement noise. Suppose the true field $Z(\mathbf{s})$ is perfectly smooth (no nugget, $\gamma_Z(0^+) = 0$ ), but every sample value carries an independent additive measurement error $\varepsilon$ with variance $\sigma_\varepsilon^2$ . What you observe is

Y(\mathbf{s}_i) \;=\; Z(\mathbf{s}_i) + \varepsilon_i, \quad \varepsilon_i \stackrel{\text{iid}}{\sim} (0, \sigma_\varepsilon^2), \quad \varepsilon_i \perp Z, \;\varepsilon_i \perp \varepsilon_j \text{ for } i \ne j.

The empirical variogram of $Y$ at lag $h > 0$ averages the squared differences $(Y(\mathbf{s}_i) - Y(\mathbf{s}_j))^2 / 2$ , which expand to

\tfrac{1}{2}\,[(Z(\mathbf{s}_i) - Z(\mathbf{s}_j)) + (\varepsilon_i - \varepsilon_j)]^2.

Taking expectations, the cross terms vanish (because $\varepsilon$ is independent of $Z$ and of itself across locations), and you get

\gamma_Y(h) \;=\; \gamma_Z(h) + \sigma_\varepsilon^2 \quad \text{for } h > 0.

The noise variance $\sigma_\varepsilon^2$ adds a CONSTANT to $\gamma_Z(h)$ at every positive lag. Visually, it shifts the whole curve up by $\sigma_\varepsilon^2$ for all $h > 0$ but leaves $\gamma_Y(0) = 0$ (since the same sample minus itself is zero). The result is a JUMP at $h = 0^+$ of exactly $\sigma_\varepsilon^2$ : $c_0 = \sigma_\varepsilon^2$ .

Under the measurement-error interpretation, the nugget is NOT a property of the underlying spatial field. It is a property of the MEASUREMENT PROCESS. The true field $Z$ has $\gamma_Z(0^+) = 0$ (perfectly smooth, no nugget); only the observed $Y$ shows a nugget. The kriging implication is sharp: you do not want to reconstruct $Y$ ; you want to reconstruct $Z$ . The kriged surface should DENOISE — pass through the smoothed underlying field, not the noisy observations. At a sample location, the kriged value should lie BETWEEN the noisy sample and the surrounding field's smooth trend — a Bayesian-style shrinkage toward the prior.

2. Sub-sampling-scale variability

The second source is a property of the spatial field itself. Suppose the true field $Z(\mathbf{s})$ has real spatial structure at scales BELOW your minimum sampling interval. Two samples taken at the same spatial location would give identical values (no measurement error). But two samples taken even slightly apart — within your minimum sampling step — already capture some real spatial variability. Project this all the way down to the sampled lag $h_1 = \Delta$ : the empirical variogram at the smallest lag is already at some positive value, because $Z$ genuinely varies at sub- $\Delta$ scales.

Under this interpretation, the nugget IS a property of the spatial field. There is no measurement error. The samples are perfectly accurate at their own locations. But the field has structure that the sampling design cannot resolve. The kriging implication is different: every sample is a faithful observation of $Z$ at its location; kriging must HONOUR each sample exactly (pass through it). The unresolved sub-sample-spacing structure shows up in the variance of the kriging predictor (Part 6's kriging variance), but the predictor itself interpolates exactly at sample locations.

The two interpretations produce IDENTICAL empirical variograms. You cannot tell them apart from $\hat\gamma$ alone. The shape of the variogram has the same nugget either way. The downstream kriging is qualitatively different. The first widget in §4.2 shows the variogram side; the second widget shows the kriging side.

The first widget for §4.2 takes a synthetic binned empirical variogram drawn from a known nugget + spherical truth, and asks you to fit $c_0$ , $c$ , and $a$ against the noisy bins. The truth sliders set the generator; the fit sliders set your model. The display reports the c₀/(c₀+c) ratio and the weighted SSE of the fit.

Three things to do with this widget. First, set the truth to a moderate nugget — $c_0^{\mathrm{true}} = 0.20$ , $c^{\mathrm{true}} = 0.80$ , $a^{\mathrm{true}} = 0.40$ — and fit by eye WITHOUT revealing the truth. Drag the $c_0$ -fit slider until the dashed curve's y-intercept at $h = 0^+$ matches what the smallest-lag binned points seem to be pointing to (NOT just the lowest-lag bin's value, which is noisy); drag $c$ -fit until the high-lag flat region matches the binned plateau; drag $a$ -fit until the elbow location matches the kink in the bins. Then click "Reveal truth" to overlay the generating curve. You will rarely get $c_0$ exactly right by eye — the lowest-lag bin is the noisiest part of the empirical variogram, and beginners almost always under-shoot the modeled nugget if they aim at the visible bin value.

Second, drop the N pairs slider to 20 (very few pairs per bin) and watch the empirical points get noisier. The fit slider positions you chose earlier may no longer track the bins well; in particular, the lowest-lag bin's value swings wildly with each resample. The MODELED $c_0$ — the intercept you fit using the SHAPE of the curve, not the position of a single noisy point — is more stable than any single bin's value. This is the §4.2 honest caveat: the visible smallest-lag bin is biased low (and noisy) under the Cressie–Hawkins-style robustness corrections; the modeled $c_0$ after fitting is the right number to use (Cressie 1993 §2.4).

Third, slide the truth's $c_0$ all the way up to 0.60 (a nugget that is most of the total variance). The c₀/(c₀+c) ratio reads "noise-dominated"; the binned variogram is essentially flat with a small linear lift near the origin; almost all the variance is unstructured. This is the warning signal that something has gone wrong upstream — usually sampling too coarse for the underlying structure, or a measurement process much noisier than the signal of interest. A defensible report says "ratio = 0.75, nearly pure nugget — collect denser samples or improve the measurement before kriging".

How to tell measurement error from microscale variability

The empirical variogram alone cannot distinguish the two sources of the nugget. The same $c_0$ shape arises from white-noise measurement error on a smooth $Z$ and from real microscale structure with no measurement noise. To tell them apart, you need an EXPERIMENT — a deliberate change in either the sampling resolution or the replication structure.

Resampling at finer spatial resolution

If you can take additional samples at LAGS SHORTER than your original minimum sampling interval $\Delta$ , the variogram at those new finer lags reveals the source:

If the nugget SHRINKS as the lag shrinks: the variability you were calling "nugget" was real spatial structure at scales between $\Delta$ and the new finer $\Delta'$ . Below $\Delta'$ , even finer sampling would reveal even more structure. The "nugget" was an artefact of your sampling resolution; the field has structure all the way down (or until you hit measurement-error scale).
If the nugget STAYS THE SAME as the lag shrinks: the variability is genuine measurement noise that you carry on every sample regardless of where it is taken. No matter how fine you sample, two samples (even at the same physical location, if you could) would still differ by the same $\sigma_\varepsilon$ . The "nugget" is real and is a property of the measurement process.

This is the gold-standard test. It requires you to be able to take additional samples — feasible for laboratory measurements, soil cores, seismic-attribute pixels, but expensive for deep drillholes. When practical, the resampling test settles the question.

Independent replicate samples at the same location

The second experiment: take $k$ independent replicate samples at the SAME physical location (same drillhole interval, same soil core depth, same seismic-pixel reflection) and compute their sample variance. If the field has no measurement error, the $k$ samples are identical (variance = 0). If there is measurement error, the $k$ samples differ by IID noise, and their sample variance is a direct estimate of $\sigma_\varepsilon^2 = c_0^{\mathrm{measurement}}$ .

This is the laboratory gold standard. Send the same physical specimen through your assay machine ten times; the variance across the ten reads is the measurement-error variance. Send the same drillhole interval to two different labs; the variance across labs is a different (between-lab) component. The fraction of the total empirical nugget that is explained by these replicated-measurement variances is the measurement-error part of $c_0$ . The rest is microscale variability.

In published practice (Goovaerts 1997 §4.4; Chilès & Delfiner 2012 §2.3), authors who care about the distinction report BOTH a measurement-error variance from QA/QC replicates and a total empirical nugget from $\hat\gamma$ ; the difference is attributed to microscale structure. Authors who do not report this typically treat the entire nugget as if it were microscale variability (kriging interpolates) and accept the consequence that any measurement error in the data is preserved in the kriged map.

The second widget for §4.2 takes the SAME 1D sample profile and the SAME nugget+spherical variogram and produces two kriged curves side by side — one per interpretation. The LEFT panel treats the nugget as measurement error and DENOISES: the fitted curve passes through a smoothed underlying signal, not the raw samples. The RIGHT panel treats the nugget as microscale variability and INTERPOLATES: the fitted curve passes through every sample exactly.

Slide the nugget $c_0$ from 0 toward 0.20 and watch the two panels diverge. At $c_0 = 0$ the two interpretations collapse to the same predictor; both interpolate every sample. As $c_0$ grows, the LEFT panel's curve pulls away from the data — peaks get clipped, troughs get filled in, the fit approaches the smoother underlying trend (which the grey "true Z(s)" line shows). The RIGHT panel's curve continues to honour every sample, regardless of how big the nugget gets.

Both behaviours are mathematically correct under their respective interpretations. The DENOISE behaviour minimises mean-squared error to the underlying smooth $Z$ when the measurement-error model is right; the INTERPOLATE behaviour gives the best estimate AT THE SAMPLE SUPPORT when the microscale-variability model is right. Pick the wrong interpretation and you get the wrong product:

If the nugget is genuine measurement error and you use the interpolating predictor (RIGHT), your map preserves all the measurement noise. The kriged surface has spurious bumps at every sample location — peaks where a sample happened to be noisy-high, troughs where another happened to be noisy-low. Downstream products (volumetrics, hot-spot identification) inherit those artefacts.
If the nugget is genuine microscale variability and you use the denoising predictor (LEFT), you erase real spatial structure that exists at sub-sampling-scale. The kriged map is too smooth; small-scale features (high-grade pods, contamination hot-spots, fine-scale facies variation) get averaged into bland mean values. Downstream decisions (drill targets, remediation locations) miss the things that matter.

Most general-purpose geostatistics software (GSLIB, SGeMS, gslib-R) defaults to the INTERPOLATING behaviour: the nugget enters the kriging system as part of the structured covariance, and the predictor honours every sample exactly. This is the standard default in mining geostatistics where samples are physical drillhole intervals that you genuinely want to honour. If you have a measurement-error story for your nugget (e.g., laboratory replicate variance from QA/QC), you typically have to invoke a non-default "kriging with measurement error" mode (Cressie 1993 §3.2.1; Christensen 1991, Diggle & Ribeiro 2007 §6) to get the DENOISE behaviour. Always check the documentation; the default may not match your physical setup.

The c₀/(c₀+c) ratio as a structure diagnostic

A single number summarises how much of the field's total variance is unstructured (nugget) versus structured (spatial): the ratio

\frac{c_0}{c_0 + c} \;\in\; [0, 1].

This ratio is sometimes called the relative nugget. Its rough interpretation, codified in practitioner guidance over many decades (Cambardella et al. 1994, with adaptations in Goovaerts 1997 §4.2):

< 0.10: well-structured. The field has a clean spatial signal; kriging will produce sharp, faithful maps.
0.10 – 0.25: mostly structured. Spatial structure dominates; modest noise on top.
0.25 – 0.55: moderate structure. Real spatial signal exists but a substantial fraction of the variance is unstructured. Kriging is still useful but maps are necessarily smoother than the underlying truth.
0.55 – 0.75: noise-dominated. Most of the variance is nugget. Kriging maps are smooth (close to the global mean); the structure that remains is short-range only.
> 0.75: nearly pure nugget. There is little spatial signal at the sampled scales. Kriging reduces to global mean estimation. Usually a diagnostic of inadequate sampling resolution or excessive measurement noise — collect denser samples or improve the measurement before kriging.

The ratio is independent of $a$ (range) and of the overall variance level: it normalises out scale and amplitude, leaving only the structure-vs-noise composition. Report it whenever you report a variogram fit; it conveys at a glance whether the kriged map will be informative or simply smooth-and-flat.

The pure-nugget model — when c₀/(c₀+c) → 1

The extreme case is the PURE-NUGGET model:

\gamma(h) \;=\; c_0 \cdot \mathbb{1}[h > 0], \quad C(h) \;=\; c_0 \cdot \mathbb{1}[h = 0].

The variogram jumps from 0 to $c_0$ at $h = 0^+$ and stays flat at $c_0$ for every positive lag. The covariance has $C(0) = c_0$ and $C(h) = 0$ for every $h > 0$ . Pairs of values at different locations are UNCORRELATED, no matter how close. The field is effectively a sequence of IID draws from a distribution with variance $c_0$ ; the spatial coordinate carries no information.

Kriging this model reduces to global-mean prediction. The sample at $\mathbf{s}_i$ tells you nothing about the value at $\mathbf{s}_j$ (zero covariance). The best linear unbiased estimator at any new location $\mathbf{s}^*$ is the global mean $\bar Z = \frac{1}{n} \sum_i Z(\mathbf{s}_i)$ . The kriging variance is identical at every location — the global variance $c_0$ . There is no spatial map to produce; only a single number with an uncertainty.

You almost never fit a final variogram as pure nugget. The pure-nugget model is a DIAGNOSTIC: if your empirical $\hat\gamma$ looks roughly flat across all lags, with no structured rise, then your data has no useful spatial signal at the sampled scales. The conclusion is not "kriging will produce a flat map" but rather "sample more densely" or "reduce measurement noise" or "rethink whether the field actually has spatial structure". Goovaerts 1997 §4.4 and Isaaks & Srivastava 1989 §16.1 both flag pure-nugget $\hat\gamma$ as an inadequate-experiment finding rather than a defensible final model.

Pure-nugget $\hat\gamma$ does sometimes appear in real datasets. It typically means: (a) sampling spacing $\Delta$ exceeds the underlying field's range, so every sample pair is essentially independent; (b) measurement noise is much larger than the true field's variance; or (c) the variable being measured has no spatial structure at all (e.g., a categorical variable distributed at random across locations).

Honest caveats — small-lag bias and processes without a nugget

The §4.2 picture has two important honesty caveats worth knowing.

Small-lag bias. The visible value of $\hat\gamma$ at the smallest empirical lag $h_1$ is NOT necessarily a good estimate of $c_0$ . Three sources of bias are worth flagging:

Sampling-distribution bias. Even with the unbiased Matheron estimator, $\hat\gamma(h_1)$ has finite-sample variance that scales like $1/N(h_1)$ ; if $N(h_1)$ is small (poor short-lag pair coverage — common because most random sample configurations have few short-lag pairs), the smallest-lag empirical value swings widely with the data realisation. The MODELED $c_0$ after fitting uses the SHAPE of the curve, not the single noisy point, and is more stable.
Cressie–Hawkins downward bias. The robust Cressie–Hawkins estimator (§3.6) systematically reads LOWER than the classical Matheron estimator under the Gaussian-increment model, with a known small-sample bias correction factor $(0.457 + 0.494/N(h))$ . If your software uses Cressie–Hawkins without applying the correction, the visible small-lag empirical value under-reads the true nugget by 5–15%. The corrected estimator (or a Matheron baseline) and the modeled $c_0$ after fitting are the right numbers.
Bin-edge effects. The first empirical bin spans $[0, h_1 + \Delta h /2]$ ; it averages variability at lags between 0 and a small positive value. If the true $\gamma_{\mathrm{str}}$ rises sharply across this range (e.g., a steep Exponential), the first bin's average is HIGHER than $c_0$ alone — but still LOWER than $\gamma_{\mathrm{str}}(h_1)$ would be at the bin centre. Read the first bin as an estimate of the AVERAGED variogram across the bin, not as a point estimate of $\gamma$ at any single lag.

The combined effect: the visible smallest-lag empirical value is BIASED LOW relative to the true $c_0$ . The defensible practice is to FIT the model across all bins (using weighted least squares or maximum likelihood — §4.5), extract the modeled $c_0$ from the fit's intercept at $h = 0^+$ , and report THAT as the nugget. Reporting " $c_0 = \hat\gamma(h_1)$ " is a common beginner mistake.

Not every process has a nugget. The opposite extreme also exists. Very smooth Gaussian random fields (Gaussian variogram, no measurement error, sampled at fine resolution) genuinely have $\gamma(0^+) = 0$ : the variogram approaches zero smoothly as $h \to 0^+$ . Brownian motion has linear $\gamma(h) \propto h$ with $\gamma(0^+) = 0$ — but it is not a stationary process (variance grows linearly), so it lies outside the second-order-stationary geostatistics framework. Bandlimited seismic-derived attributes after enough low-pass filtering can approach $\gamma(0^+) = 0$ : the very-short-lag variability has been processed away upstream.

Reporting $c_0 = 0$ honestly is appropriate when (a) the data is genuinely smooth, (b) measurement noise is negligible, and (c) sampling resolution is fine enough to resolve all the spatial structure that exists. In earth-science practice these conditions are uncommon but not impossible. The §4.2 default — assume there is a nugget, fit it explicitly — is the right starting point; the $c_0 = 0$ outcome is a fitted result, not a baseline assumption.

Try it

In nugget-fit-explorer, set $c_0^{\mathrm{true}} = 0.20$ , $c^{\mathrm{true}} = 0.80$ , $a^{\mathrm{true}} = 0.40$ , $N,\mathrm{pairs} = 60$ . Fit by eye WITHOUT clicking Reveal. Then click Reveal — how close did you get? The ratio c₀/(c₀+c) for these truth values is 0.20 — well-structured. Did your fit recover that ratio?
Hold the same truth and drop the N pairs slider to 20. Click Resample several times — the binned points swing visibly with each resample. Now refit. The lowest-lag bin's value is much less stable than before; the MODELED $c_0$ is more stable than the visible bin. This is the small-lag bias point: trust the shape, not a single point.
Slide the truth to $c_0^{\mathrm{true}} = 0.60$ , $c^{\mathrm{true}} = 0.40$ , $a^{\mathrm{true}} = 0.30$ . Ratio = 0.60 (noise-dominated). Look at the binned plot — almost flat across all lags, with only a small linear rise near the origin. Report what you would tell a reservoir engineer who handed you this empirical variogram and asked for a kriged map.
In measurement-error-vs-microscale, set $c_0 = 0$ , $c = 0.20$ , $a = 0.18$ . Both panels look identical and pass through every sample. Slide $c_0$ to 0.04, then 0.10, then 0.20. Watch the LEFT panel's curve pull away from the samples; the RIGHT panel's curve stays glued. Compare each to the grey "true Z(s)" line — at $c_0 = 0.10$ , which interpretation's fit is closer to the truth?
In measurement-error-vs-microscale, set the sample noise σ_ε slider to 0.12 (substantial Gaussian noise on the generated profile) and the nugget to 0.10. Click Resample several times. With every reseed, the sample dots move; the LEFT (denoising) panel's curve stays close to the grey truth line; the RIGHT (interpolating) panel's curve faithfully chases the new noise pattern. Under measurement-error interpretation, the LEFT is the desired downstream product.
Same widget: set sample noise to 0.00 (no measurement error at all) and nugget to 0.10. The sample dots now lie exactly on the grey true Z(s) curve (no noise was added when generating). The LEFT panel still smooths AWAY from the samples — but here the samples ARE the truth, so the LEFT panel's prediction misses the truth at every sample location. This is the failure mode of choosing the wrong interpretation: under microscale-variability truth with zero measurement noise, the denoising interpretation throws away signal you wanted to keep.
Without coding: a geochemistry survey of soil contamination reports a variogram fit "Spherical, range = 80 m, sill = 0.50, nugget = 0.12". Compute the c₀/(c₀+c) ratio and classify the structure level. A QA/QC programme that ran lab duplicates measures $\sigma_\varepsilon^2 = 0.05$ in the same units. What fraction of the nugget is measurement error, and what fraction is microscale variability? What kriging mode would you recommend?
Without coding: a porosity log shows an empirical variogram that is essentially flat at $\hat\gamma \approx 0.04$ across every lag from 5 m to 200 m, with the smallest-lag bin at $\hat\gamma(h_1 = 5,\mathrm{m}) = 0.038$ . Diagnose. Is this a defensible pure-nugget model fit, and what would you propose to the operator before committing to a kriged porosity map?

Pause and reflect: the same nugget can arise from two physically opposite mechanisms (white-noise measurement error vs real sub-sampling-scale variability), and the variogram alone cannot tell them apart. Choosing the wrong interpretation systematically biases downstream kriging maps. What information about your data's collection process would you most want to gather BEFORE committing to a kriging mode? What QA/QC steps would you build into a future survey to make the distinction defensible upfront?

What you now know — and what Part 4 will build on it

You can write the variogram model with a nugget: $\gamma(h) = c_0 \cdot \mathbb{1}[h > 0] + \gamma_{\mathrm{str}}(h)$ , with the structured component a permissible family from §4.1 and total sill $c_0 + c$ . You understand that $\gamma$ has a jump discontinuity at $h = 0^+$ — $\gamma(0) = 0$ by definition but $\gamma(0^+) = c_0 > 0$ — and that this is the standard observation in real data, not an exception.

You know the two physical sources of the nugget — measurement error (white noise on top of a smooth $Z$ ) and sub-sampling-scale variability (real spatial structure below the minimum sampling interval) — and you understand that they produce IDENTICAL empirical variograms. You can describe the experimental designs that distinguish them: resampling at a finer spatial resolution, and independent replicate samples at the same physical location. You know that the choice between the two interpretations matters for downstream kriging: measurement-error mode DENOISES (kriged surface passes through a smoothed underlying signal); microscale-variability mode INTERPOLATES (kriged surface honours every sample).

You can compute and interpret the c₀/(c₀+c) ratio as a quick structure diagnostic. You recognise the pure-nugget extreme (ratio → 1) as a diagnostic of inadequate sampling or excessive measurement noise rather than a defensible final fit. You understand the small-lag bias of the visible empirical variogram and know to read the modeled $c_0$ (the fitted intercept) rather than the value of the smallest-lag bin alone. You also know that genuinely smooth processes with $c_0 = 0$ exist, even if they are uncommon in earth-science fields.

Part 4 continues here. §4.3 lifts the isotropy assumption: the canonical families and the nugget machinery developed so far all assume $\gamma$ depends only on $|\mathbf{h}|$ , but real fields routinely have different ranges along different directions — geological grain, sedimentary trend, regional structural fabric. §4.3 wraps the isotropic families in an ANISOTROPIC ELLIPSOID, turning the scalar range $a$ into a tensor with principal axes and orientations. §4.4 develops NESTED STRUCTURES in full: sums of two or three permissible variograms to fit empirical shapes that no single family can match — and the nugget+structured pattern of §4.2 is itself the simplest nested model (nugget + one structured component). §4.5 covers FITTING STRATEGIES — by eye, by weighted least squares, and by maximum likelihood — and how to choose $c_0$ , $c$ , and $a$ jointly from data with defensible statistical machinery.

By the end of Part 4 you will have a defensible permissible variogram model $\gamma(h; \boldsymbol\theta)$ — possibly anisotropic, possibly nested, with an honest nugget — that Part 5 can plug into the kriging system. The §4.2 honesty about what the nugget is and is not propagates forward: kriging mode (denoise vs interpolate), kriging variance, simulation realisations (Part 7), reserve estimates (Part 8). Make the nugget call carefully and report it explicitly.

References

Matheron, G. (1962). Traité de géostatistique appliquée, Tome I. Éditions Technip, Paris. (The foundational reference for the nugget effect under that name. The term is documented here as the standard discontinuity at $h = 0^+$ in real-data variograms and traced to the mining-geostatistics tradition of South African gold-deposit assays in the 1950s.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (Chapter 2 develops the variogram-with-nugget machinery formally. §2.4 covers the small-lag bias of the empirical variogram, including the Cressie–Hawkins correction. §3.2.1 develops kriging with measurement error, the canonical mathematical-statistics treatment of the DENOISE-vs-INTERPOLATE choice.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (Chapter 2 is the modern comprehensive practitioner reference for variogram modelling including nugget. §2.3 develops the two physical interpretations and the resampling test that distinguishes them. §3.3.4 covers kriging with a nugget under each interpretation.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§4.2 covers nugget modelling in a practitioner-oriented format. §4.4 discusses inadequate-resolution diagnosis when c₀/(c₀+c) approaches 1. The standard c₀/(c₀+c) classification used in mining and reservoir geostatistics is documented here.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 7 introduces the nugget effect at an entry level. §16 develops the pure-nugget model as a diagnostic and the practitioner advice to refit denser sampling rather than commit to a pure-nugget final model.)
Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide (2nd ed.). Oxford University Press. (The canonical software-side reference for how nugget models are encoded and consumed in production geostatistics tools. Documents the INTERPOLATING-default behaviour of the standard kriging system in GSLIB, and the optional kriging-with-measurement-error mode.)
Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications (3rd ed.). Springer. (§4.4 covers the nugget effect in the context of a broader variogram-modelling framework. The chapter explicitly connects the nugget interpretation to the choice of kriging neighbourhood and to the effect on the kriging variance.)
Cressie, N., Hawkins, D.M. (1980). Robust estimation of the variogram: I. Journal of the International Association for Mathematical Geology, 12(2), 115–125. (The original derivation of the Cressie–Hawkins robust variogram estimator with its known finite-sample bias correction. The downward bias at small lags relative to the classical Matheron estimator is documented here as part of the small-sample machinery.)