Covariance, correlogram, and variogram — three views of the same thing

Part 3 — Experimental variograms

Learning objectives

Define the three spatial-correlation functions under second-order stationarity: the covariance $C(h) = \mathrm{Cov}(Z(\mathbf{s}), Z(\mathbf{s}+\mathbf{h}))$ , the correlogram $\rho(h) = C(h)/C(0)$ , and the (semi)variogram $\gamma(h) = (1/2)\,\mathrm{Var}(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s}))$ , and state the identity $\gamma(h) = C(0) - C(h)$ that links them
Explain why three names coexist: covariance is natural in stochastic-process theory; the correlogram is dimensionless and easy to compare across variables; the variogram is the only one of the three that survives under intrinsic stationarity (strictly weaker than second-order stationarity) and is therefore the geostatistical workhorse
State the strict 2γ vs γ naming convention ( $2\gamma(h)$ is the variogram, $\gamma(h)$ is the semivariogram in Matheron 1963; modern practice including this textbook uses $\gamma(h)$ throughout and calls it the variogram), and recognise that the factor-of-2 distinction matters only when you derive the kriging equations
Read the anatomy of γ(h): the sill C(0), the (practical) range a, the nugget effect c₀ as the $\lim_{h \to 0^+} \gamma(h)$ jump, and the transition zone between the origin and the sill — and connect each component to a feature of the underlying spatial structure
Position γ(h) as the INPUT to kriging (Part 5): kriging weights minimise a prediction variance whose every term is a value of γ. A wrong variogram model produces silently miscalibrated kriging maps — the modal way kriging projects fail in practice (Part 5.8)
Distinguish the IDEAL variogram (continuous function on a stationary population) from the EMPIRICAL variogram (binned, noisy, possibly non-stationary estimator from a finite dataset). §3.1 defines the ideal; §3.2 onward binds it to data. The gap is what makes Parts 3 and 4 of this book non-trivial
Recognise what the variogram does NOT tell you: third-order moments, higher-order joint distributions, asymmetry. Two fields with the same γ(h) can have very different multipoint statistics — the Part 9 motivation for moving beyond two-point geostatistics

Part 0 set up spatial data and the assumption of stationarity. Parts 1 and 2 worked exclusively with the marginal distribution: histograms, normal-score transform, declustering, robust statistics. Every method in those parts was a single-point statistic — what is the value distribution averaged over the whole study area, ignoring how the values are arranged spatially. Part 3 takes the next step. It moves from "what is the distribution of values" to "how does the value at one location relate to the value at another, as a function of their separation". That relationship is the central object of geostatistics, and this section defines it three different ways.

The three ways — the covariance function $C(h)$ , the correlogram $\rho(h)$ , and the variogram $\gamma(h)$ — are algebraically equivalent under second-order stationarity. They carry exactly the same information, presented in three slightly different forms, and you can convert between them with a single arithmetic step. So why does the field use three names? Each one has a piece of theory that is easier to write in its language, and one of them — the variogram — has a theoretical advantage that makes it the canonical choice for the rest of this book. The §3.1 widgets make all three visible side by side, so you can see the shapes line up before any algebra arrives.

This is the foundational section of Part 3, and it is the section that the rest of the textbook leans on most heavily. Every kriging equation in Part 5, every simulation routine in Part 7, every indicator method in Part 8, and every capstone in Part 10 will use a variogram model that was fitted in Part 4 to an empirical variogram that was computed in Part 3. If you understand the variogram clearly here, the rest of the book is a series of natural elaborations. If you do not, every subsequent algorithm looks like a black box that needs a mysterious " $\gamma(h)$ " plugged in. Take the time.

The setup: second-order stationarity (briefly)

To talk about how the value at $\mathbf{s}$ relates to the value at $\mathbf{s}+\mathbf{h}$ as a function of $\mathbf{h}$ alone — not also of $\mathbf{s}$ — we need the relationship to be the SAME no matter where in the study area we drop the pair. That is the substance of second-order stationarity. Formally: the random function $Z(\mathbf{s})$ is second-order stationary if (i) $\mathbb{E}[Z(\mathbf{s})] = \mu$ is constant across the domain, and (ii) the covariance $\mathrm{Cov}(Z(\mathbf{s}), Z(\mathbf{s}+\mathbf{h}))$ depends only on the lag vector $\mathbf{h}$ , not on the base location $\mathbf{s}$ . The lag vector has a direction and a length; for now we will treat it as a scalar $h$ (the isotropic case — §3.3 lifts the isotropy assumption).

Section §0.2 unpacks stationarity, ergodicity, and the practical compromises that real data force on you. The short version: NO geological field is genuinely second-order stationary across an arbitrary domain — even the most uniform sandstone has compositional drift, depositional fabric, or compaction trends that violate the constant-mean assumption strictly. What works in practice is to choose a domain over which stationarity is APPROXIMATELY true (often after detrending or after restricting to a single facies), and to interpret the variogram as a description of the residual spatial correlation inside that domain. §3.1 takes the second-order-stationary assumption as given so the algebra is clean; you have already met the honest framing in §0.2.

The covariance function C(h)

The first description of the spatial-correlation structure is the most direct. The covariance function at lag $h$ is

C(h) \;=\; \mathrm{Cov}\!\left(Z(\mathbf{s}),\, Z(\mathbf{s}+\mathbf{h})\right) \;=\; \mathbb{E}\!\left[(Z(\mathbf{s}) - \mu)(Z(\mathbf{s}+\mathbf{h}) - \mu)\right].

At zero lag, $C(0) = \mathrm{Var}(Z) = \sigma^2$ : the variance of the field. At large lags, where values at the two locations are uncorrelated, $C(h) \to 0$ . In between, $C(h)$ decreases monotonically (for any reasonable model — Spherical, Exponential, Gaussian, all the canonical permissible families do this). The covariance function is natural in stochastic-process theory; it is the object that appears directly in the integrals that define second-order stationarity, and it is the object that classical time-series spectral analysis ports unchanged into space.

For practical work the covariance has one inconvenience: its scale depends on the units of $Z$ . A covariance of "0.32" tells you nothing without knowing whether $Z$ is porosity (so $\sigma^2 \approx 0.05$ ) or seismic amplitude (so $\sigma^2 \approx 10^6$ ). To compare structures across variables you want the next form.

The correlogram ρ(h)

The correlogram — also called the correlation function — is the dimensionless covariance:

\rho(h) \;=\; \frac{C(h)}{C(0)} \;=\; \frac{C(h)}{\sigma^2}.

By construction $\rho(0) = 1$ and $\rho(h) \to 0$ at large lag, so the correlogram always lives on the interval $[0, 1]$ (or $[-1, 1]$ if you allow negative correlation — rare for natural fields). The SHAPE of $\rho(h)$ is identical to the shape of $C(h)$ ; only the y-axis units change. The correlogram is what you reach for when you want to compare the spatial structure of porosity in one reservoir to the structure of permeability in another — both are now on the same dimensionless scale, and a 0.5 correlation at $h = 100$ m means the same thing in both contexts.

The variogram γ(h) — and why it is the workhorse

The third description trades a different change of variable for a different theoretical reach. The variogram (strictly the semivariogram, but more on that in a moment) is

\gamma(h) \;=\; \tfrac{1}{2}\,\mathrm{Var}\!\left(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})\right).

It is the variance of the increment — how much the value typically changes when you move a distance $h$ . Under second-order stationarity, a direct algebraic step shows

\gamma(h) \;=\; \tfrac{1}{2}\,\mathbb{E}\!\left[(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s}))^2\right] \;=\; \tfrac{1}{2}\!\left[\mathrm{Var}(Z(\mathbf{s}+\mathbf{h})) + \mathrm{Var}(Z(\mathbf{s})) - 2 C(h)\right] \;=\; \sigma^2 - C(h) \;=\; C(0) - C(h).

So $\gamma(h)$ is just $C(0) - C(h)$ flipped upside down. At zero lag $\gamma(0) = 0$ (the variance of $Z(\mathbf{s}) - Z(\mathbf{s}) = 0$ is zero). At large lag, where $C(h) \to 0$ , $\gamma(h) \to C(0) = \sigma^2$ — the variance of the field. So while $C(h)$ and $\rho(h)$ go DOWN with increasing $h$ , $\gamma(h)$ goes UP. The information is the same; the picture is inverted. The first §3.1 widget below puts all three side by side so the mirror-image relationship is immediate.

Three views, one curve

The first widget for §3.1. Pick one of three canonical model families — Spherical, Exponential, or Gaussian — and slide the range, sill, and nugget parameters. The three panels — covariance $C(h)$ , correlogram $\rho(h)$ , variogram $\gamma(h)$ — all redraw in lockstep. Watch:

C(h) starts at $C(0) = c_0 + c$ (the total variance — sometimes called the total sill) and decreases to zero at large lag.
ρ(h) has the SAME SHAPE as $C(h)$ , but its y-axis is fixed to $[0, 1]$ . The covariance and correlogram differ only by a constant scale factor — they carry identical structural information.
γ(h) is the mirror image of $C(h)$ about the horizontal line at $C(0)/2$ . It starts at zero and rises to the sill $C(0)$ at large lag.

Move the range slider and all three curves stretch or compress in lockstep — same information, three presentations. Move the sill slider and the y-axes of $C$ and $\gamma$ scale together, while $\rho$ stays pinned to $[0, 1]$ — dimensionless by construction. Move the nugget slider above zero and watch a discontinuity appear at the origin: $\gamma$ jumps UP from $0$ to $c_0$ at $h = 0^+$ ; $C$ and $\rho$ both jump DOWN from their full values at $h = 0$ to a lower value at $h = 0^+$ . The nugget is a fourth structural feature that we will dissect in detail in §4.2; for now, recognise that it shows up identically in all three views as the size of the jump at the origin.

So why three names for one object?

If the three functions are algebraically equivalent under second-order stationarity, why does the field use all three? Two reasons.

Algebraic convenience. Each form makes a different piece of theory easier to write. The covariance $C(h)$ is the natural object when you do stochastic-process theory in the spectral domain — its Fourier transform is the power spectral density, exactly the Wiener–Khinchin form from time-series analysis. The correlogram $\rho(h)$ is the natural form for cross-variable comparison and for the $\rho \to 0$ visualisations that appear in classical autocorrelation diagnostics. The variogram $\gamma(h)$ is the natural form for the kriging system: kriging variances are expressed as sums of $\gamma$ values, and the matrix equations are conventionally written with $\gamma$ rather than $C$ in modern references (Goovaerts 1997, Chilès & Delfiner 2012). You will see all three in the literature; converting between them is a one-line operation and you should be comfortable doing it.

The variogram's theoretical advantage. This is the deeper reason and the reason the variogram is the canonical choice for the rest of this book. The covariance $C(h)$ is only defined when $\mathrm{Var}(Z)$ is finite — that is, when the field is second-order stationary. The variogram $\gamma(h)$ only needs the INCREMENTS $Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})$ to have finite variance, NOT $Z$ itself. This weaker assumption is called intrinsic stationarity:

$\mathbb{E}[Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})] = 0$ for all $\mathbf{s}, \mathbf{h}$ (the increments have constant mean — zero), and
$\mathrm{Var}(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})) = 2\gamma(h)$ depends only on $\mathbf{h}$ .

Intrinsic stationarity is strictly WEAKER than second-order stationarity: every second-order stationary process is intrinsic, but not the converse. The classic example is Brownian motion, which has infinite variance (so $C$ does not exist) but well-defined increment variance (so $\gamma$ does exist — it is linear in $h$ ). More importantly for geostatistics, many real geological fields have apparent long-range trends that violate the constant-mean assumption of second-order stationarity but become well-behaved when viewed through $\gamma$ — the variogram absorbs the trend by working with differences rather than absolute values. Working in the variogram domain gives you a workflow that survives a wider class of real geological data than the covariance domain would.

This is why every modern reference (Cressie 1993, Goovaerts 1997, Chilès & Delfiner 2012) builds the spatial-correlation machinery on the variogram rather than on the covariance, even though the two are algebraically equivalent in the second-order-stationary case. The variogram is robust in a precise mathematical sense; the covariance is not. The rest of this textbook reflects that choice: kriging in Part 5 is derived from a system of $\gamma$ -values, simulation in Part 7 uses $\gamma$ -conditioned realisations, and the empirical estimator in §3.2 estimates $\gamma$ directly rather than going through $C$ .

The 2γ vs γ naming convention

One small but persistent source of confusion in the literature. Matheron's original 1963 paper defined the variogram as $2\gamma(h) = \mathrm{Var}(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s}))$ — the full variance of the increment — and called $\gamma(h) = \tfrac{1}{2}\mathrm{Var}(\cdot)$ the semivariogram. Strictly, then, "the variogram" is $2\gamma$ and "the semivariogram" is $\gamma$ . By the late 1970s the $\gamma$ form had become so much more common in practice — because the relationship $\gamma(h) = C(0) - C(h)$ is cleaner than $2\gamma(h) = 2(C(0) - C(h))$ — that virtually every modern textbook (Goovaerts 1997, Isaaks & Srivastava 1989, Pyrcz & Deutsch 2014, Chilès & Delfiner 2012) uses $\gamma$ and calls it the variogram.

This textbook follows the modern convention: we will use $\gamma(h)$ throughout and call it the variogram. The factor-of-2 distinction matters in exactly one place — when you derive the kriging equations from first principles (Part 5.1) and want the algebra to match the Matheron-form references — and we will be explicit then. For every other purpose in the book, "variogram" means $\gamma(h) = \tfrac{1}{2}\mathrm{Var}(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s}))$ .

The anatomy of γ(h): sill, range, nugget, transition zone

A variogram in the wild has four named structural features, each carrying a different piece of spatial information. The second widget below makes them interactive; here we define them precisely.

Sill $C(0)$ . The horizontal asymptote that $\gamma(h)$ approaches at large lag. Under second-order stationarity $C(0) = \sigma^2 = \mathrm{Var}(Z)$ , so the sill equals the variance of the field. Two samples separated by a distance well beyond the range are essentially uncorrelated, and their increment variance is $2\sigma^2 - 0 = 2\sigma^2$ , which divided by 2 gives $\gamma = \sigma^2 = C(0)$ . The total sill is the sum of the structural sill $c$ plus the nugget $c_0$ , so $C(0) = c_0 + c$ .
Range $a$ . The lag distance at which $\gamma(h)$ reaches (or effectively reaches) the sill. For the Spherical model the range is exact — $\gamma(h) = c_0 + c$ for every $h \ge a$ . For the Exponential and Gaussian models the sill is approached asymptotically; the conventional practical range $a$ is the lag at which $\gamma$ reaches $0.95,(c_0 + c)$ . Beyond the range, the field is uncorrelated; inside the range, there is exploitable spatial structure. The range is the most important number a practitioner extracts from a variogram — it sets the spatial scale of the prediction problem.
Nugget effect $c_0$ . The y-intercept-like jump at very small lag. By definition $\gamma(0) = 0$ (the variance of a zero increment is zero), but if $\lim_{h \to 0^+} \gamma(h) = c_0 > 0$ , the function has a discontinuity at the origin. Physically, the nugget combines two effects: measurement error (the assay machine has a nonzero precision, so re-measuring at the same location returns a different value), and unresolved short-scale variability (the field has structure at distances shorter than the smallest sampling interval, which appears as noise from the sample's perspective). §4.2 dissects the nugget in detail; for now, recognise that a nonzero nugget means part of the field's variance is unattributable to spatial structure at the scales you can observe.
Transition zone. The part of the curve between the origin and the range — the rising shoulder where $\gamma$ is growing fastest. The shape of the transition zone encodes the model family: Spherical gives a roughly linear rise that hits the sill at $a$ ; Exponential gives an exponential-decay-style approach with a softer knee; Gaussian gives a parabolic origin (very smooth at $h = 0$ ) and a sharper knee near the range. Part 4 develops these distinctions for fitting purposes.

Parameters ARE spatial structure

The widget below makes the most important pedagogical point of §3.1 unmissable: a variogram is not an abstract curve. Each parameter — nugget, sill, range — is a DIRECT description of what the spatial field looks like. Slide a parameter and a sample of the corresponding field redraws.

The widget shows the annotated $\gamma(h)$ on the left, with the sill, range, and nugget components called out, and a 28×28 sample of a Gaussian random field on the right whose covariance matches the current variogram. Slide the range: the blob size of the field changes in lockstep — short range produces fine-grained patchy noise, long range produces broad smooth zones. Slide the sill: the colour-contrast amplitude changes — large sill produces bold dark / bright extremes, small sill produces a muted nearly-uniform image. Slide the nugget: a salt-and-pepper speckle appears on top of the structural pattern — the nugget is the unresolved noise that breaks the otherwise smooth continuity. Switch model family: the texture changes — Gaussian gives very smooth blobs, Spherical gives crisper boundaries, Exponential gives an intermediate texture with rougher edges.

The "New realisation" button resamples the white noise that feeds the Cholesky decomposition. The variogram parameters stay the same; only the random seed changes. You will see that the FIELD changes but the same structural pattern of blob size, contrast, and speckle is preserved. The variogram fully characterises the second-order spatial statistics of the field; the realisations vary in their specific values but share that statistical signature. This is the same machinery that powers Sequential Gaussian Simulation in Part 7 — every realisation of a kriged field is one draw from this Gaussian-process distribution conditioned on the data.

Connection to kriging (Part 5 preview)

The reason this section is so important: the variogram is the INPUT to kriging. The Part 5 kriging estimator at an unsampled location $\mathbf{s}_0$ is a weighted average

Z^*(\mathbf{s}_0) \;=\; \sum_{i=1}^{n} \lambda_i\, Z(\mathbf{s}_i)

where the weights $\lambda_i$ minimise the kriging variance subject to an unbiasedness constraint. The minimisation produces a linear system $\mathbf{K}\boldsymbol{\lambda} = \mathbf{k}$ whose every entry is a value of $\gamma$ — specifically, $K_{ij} = \gamma(|\mathbf{s}_i - \mathbf{s}_j|)$ for the data-to-data covariance matrix and $k_i = \gamma(|\mathbf{s}_i - \mathbf{s}_0|)$ for the data-to-target vector. The kriging variance itself is a sum of $\gamma$ -values too. If your variogram model is correct, the kriging estimator is the BEST LINEAR UNBIASED PREDICTOR (BLUP) for the field, and its variance is the correct prediction uncertainty. If your variogram model is WRONG, both the estimator and the variance are wrong — often silently, with smooth-looking maps that are quietly miscalibrated. Part 5.8 catalogues the modal failure modes.

This is why Parts 3 and 4 build the variogram so thoroughly before Part 5 uses it. The empirical variogram estimator (§3.2), the directional and anisotropic variants (§3.3, §3.4), the indicator variant (§3.5), the robust estimators (§3.6), the permissible model families (§4.1), the nugget treatment (§4.2), the anisotropic ellipsoid (§4.3), nested structures (§4.4), and fitting strategies (§4.5) all exist to give you the most defensible $\gamma$ you can extract from your data before you commit it to the kriging system. None of those steps can be skipped; each one corresponds to a real way that the input to kriging can go wrong if you cut a corner. The §2.4 cascade story has its analogue here: a bad variogram propagates into a bad kriged map, a bad kriging variance, bad uncertainty bounds on the kriged map, biased SGS realisations, and biased reserves. The reward for the slow build is calibrated downstream products that survive peer review.

Ideal vs empirical variogram — an honest scope warning

Everything above defined $\gamma(h)$ as a CONTINUOUS function on a STATIONARY POPULATION. That is the ideal variogram. In practice you never have access to that population — you have $n$ samples at locations $\mathbf{s}_1, \ldots, \mathbf{s}_n$ , and you must ESTIMATE $\gamma(h)$ from those samples. The estimator is the empirical variogram (§3.2), which bins all pairs of samples by their separation distance and computes

\hat{\gamma}(h) \;=\; \frac{1}{2\,N(h)} \sum_{(i,j) \in N(h)} \!\left(Z(\mathbf{s}_i) - Z(\mathbf{s}_j)\right)^2

where $N(h)$ is the set of all pairs whose separation falls in a lag-bin around $h$ , and $|N(h)|$ is its size. This is the Matheron 1963 estimator. It is unbiased under second-order stationarity but it is NOISY (each lag bin has a finite number of pairs, often <30), it can be sensitive to outliers (the squared-difference term is non-robust — §3.6 introduces the Cressie–Hawkins robust variant), it depends on your choice of bin width, and it can show artefacts (jitter, monotonicity violations, apparent anisotropy) that do not reflect the true $\gamma(h)$ but rather the finite-sample variance of the estimator. Parts 3 and 4 navigate this gap between the IDEAL variogram (defined here) and the EMPIRICAL variogram (computed from finite data): §3.2 defines the empirical estimator, §3.3–§3.6 handle directional, anisotropic, indicator, and robust variants, and §4.1–§4.5 fit a permissible MODEL to the empirical estimate so that the result is a smooth continuous function that can feed the kriging system.

The honest framing: §3.1 has shown you what $\gamma(h)$ IS. The rest of Parts 3 and 4 show you what $\gamma(h)$ LOOKS LIKE WHEN YOU TRY TO ESTIMATE IT FROM REAL DATA — and what to do when it does not look like the clean curves the widgets above produced. That gap is not a defect of the theory; it is the working content of experimental and modelled variography, and it is what occupies the next nine sections of the book.

What the variogram CANNOT tell you

The variogram is a TWO-POINT statistic. It summarises the joint distribution of $Z$ at PAIRS of locations — and only the second-order moment of that joint distribution. There are real geological structures that two-point statistics cannot describe:

Higher-order moments and asymmetry. A field with high-grade hotspots surrounded by low-grade halos has very different third-order joint statistics than a field with low-grade pits surrounded by high-grade rims — but the two can have identical $\gamma(h)$ . The variogram is symmetric in $Z$ around the mean; it cannot distinguish hotspot-with-halo from pit-with-rim.
Connectivity and curvilinear features. A channelised fluvial reservoir with long sinuous sand bodies has the same variogram as an isotropic Gaussian field with the same range and sill — but the connectivity (do the sand bodies link from injector to producer?) is completely different. Two-point statistics cannot see channels. This is the Part 9 motivation for moving to multipoint statistics (SNESIM, FILTERSIM, training images).
Multimodal distributions. The variogram is derived from second-order moments and is most naturally interpreted for fields with approximately Gaussian marginals. For categorical facies, lognormal grades, or any field with multiple physical populations, the variogram tells you something useful about the average pair structure but misses the categorical / multimodal structure. Indicator variograms (§3.5) and indicator kriging (Part 8) extend the two-point machinery to categorical and quantile-thresholded data.

These limitations do not make the variogram less important — they bound what it can do. For continuous, approximately-Gaussian, single-population fields, the variogram captures essentially all the structural information that matters for kriging and simulation, and the rest of this textbook lives in that regime. For more complex fields you reach for additional machinery (indicator methods, multipoint statistics, generative models) on TOP of the variogram, not as a replacement for it. Part 9 takes up that thread.

Try it

In the three-views-explorer, set the model to Spherical with range $a = 0.4$ , sill $c = 1.0$ , nugget $c_0 = 0$ . Read the value of $\gamma(h)$ at $h = a$ from the rightmost panel. Now read the value of $C(h)$ at the same $h$ from the leftmost panel. Confirm the identity $\gamma(h) + C(h) = C(0)$ — they should add to the total sill, 1.0.
Still in the three-views-explorer, set the nugget $c_0 = 0.3$ and keep $c = 1.0$ . The TOTAL sill is now $C(0) = c_0 + c = 1.3$ . Look at the $\gamma$ panel near the origin: there is now a JUMP at $h = 0^+$ from $\gamma(0) = 0$ to $\gamma(0^+) = c_0 = 0.3$ . The same jump appears in the $C$ panel as a DROP from $C(0) = 1.3$ to $C(0^+) = 1.0$ . Why does the correlogram show a DROP and not a JUMP? (Hint: think about $\rho(h) = C(h)/C(0)$ .)
In the three-views-explorer, compare Spherical and Gaussian at the same range and sill. Look at the origin of the $\gamma$ panel. Spherical has a LINEAR rise (the $1.5(h/a) - 0.5(h/a)^3$ term is approximately linear for small $h$ ). Gaussian has a PARABOLIC rise (the $1 - \exp(-3(h/a)^2)$ form starts with a flat tangent and curls upward). Which one would produce a smoother spatial field? Verify by switching models in the variogram-anatomy widget below — Gaussian gives almost painterly smooth blobs; Spherical gives sharper boundaries.
In the variogram-anatomy widget, set the model to Exponential with range $a = 0.30$ , sill $c = 0.9$ , nugget $c_0 = 0.05$ . Click "New realisation" five or six times. The field changes each time but the BLOB SIZE stays the same (range is fixed), the CONTRAST stays the same (sill is fixed), and the SPECKLE level stays the same (nugget is fixed). This is what "the variogram fully characterises the second-order statistics" means in practice — the FIELDS vary, but the FIELD STATISTICS do not.
In the variogram-anatomy widget, hold range and sill fixed and slide the nugget from 0 to 0.6. Watch the texture change: at $c_0 = 0$ the field is smooth and continuous; by $c_0 = 0.4$ a visible salt-and-pepper speckle has appeared; by $c_0 = 0.6$ the structural pattern is nearly drowned out by noise. Real geological data almost always has a small to moderate nugget (typically 10-30% of the total sill); a nugget ratio over 50% indicates either very noisy measurements or a sample spacing too coarse to resolve the underlying spatial structure.
In the variogram-anatomy widget, set range to its minimum ( $a = 0.05$ ) and look at the field. The blobs are smaller than the eye can comfortably resolve — most pixels are essentially independent of their neighbours. Now slide the range to its maximum ( $a = 0.55$ ). The field is now essentially one big smooth gradient; even points far apart are correlated. Find the range at which the blob diameter looks comparable to one tenth of the image — this is the "natural correlation length" for that field. That distance is what the variogram's $a$ encodes.
Without coding: a colleague brings you a porosity dataset and reports an experimental variogram with $c_0 = 0.002$ , $c = 0.008$ , $a = 35$ m. What is the total sill? What is the variance of the field? What is the nugget ratio? Is this a "high" or "low" nugget dataset for porosity? (Hint: the total sill is the variance; for porosity in clean sandstones a variance around 0.01 is typical. The ratio 0.002 / 0.010 = 0.2 is in the typical real-data range — moderate but not pathological.)
Without coding: a kriging estimate uses a variogram with sill 1.0 and range 100 m. The query point is 250 m from every data point. Without solving the kriging system, predict roughly what the kriging variance will be and why. (Hint: at $h > a$ the variogram is at its sill, meaning the data points are uncorrelated with the query point. The kriging estimator will collapse to the mean and the kriging variance will be close to the field's variance — there is no spatial information to exploit beyond the range.)
Without coding: name three real-world phenomena whose spatial structure could plausibly be summarised by a variogram, and three that could not. (Hints for "could": porosity in a single facies, soil contamination above background, daily rainfall in a small watershed. Hints for "could not": a fault network — connectivity matters more than pair statistics; a river system — directionality and curvilinearity dominate; a sedimentary stack with three distinct facies — multimodal, needs indicator methods.)

Pause and reflect: the variogram is built from second-order moments of differences. Two fields with identical $\gamma(h)$ can look very different to the eye if they differ in higher-order moments — for example one has all its variance concentrated in compact hotspots, the other has its variance spread in long sinuous channels. The variogram cannot tell those two apart. When would that limitation matter in your work, and when would it not?

What you now know — and what Part 3 will build on it

You have the three equivalent descriptions of spatial correlation under second-order stationarity: the covariance $C(h) = \mathrm{Cov}(Z(\mathbf{s}), Z(\mathbf{s}+\mathbf{h}))$ , the dimensionless correlogram $\rho(h) = C(h)/C(0)$ , and the variogram $\gamma(h) = (1/2),\mathrm{Var}(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})) = C(0) - C(h)$ . You know that the three are algebraically equivalent in the second-order-stationary case but that the variogram is the canonical workhorse because it survives under the weaker assumption of intrinsic stationarity — which matters in practice for fields with apparent long-range trends or undefined variance. You have the standard naming convention (we use $\gamma$ throughout and call it the variogram, with the strict 2γ-vs-γ distinction reserved for the kriging derivation in Part 5). You can read the four structural features of $\gamma(h)$ — sill, range, nugget, transition zone — and connect each one to a visible feature of the spatial field via the variogram-anatomy widget.

You have the connection to kriging: every entry in the kriging system is a value of $\gamma$ , and a bad variogram model propagates silently into bad kriging maps and miscalibrated uncertainty. That is why Parts 3 and 4 build the variogram so carefully before Part 5 uses it. You have the honest scope warning: §3.1 defined the IDEAL variogram on a stationary population; §3.2 onward binds it to data via the empirical estimator, with all the noise, bin-choice sensitivity, and outlier vulnerability that finite-sample estimation introduces. And you have the limitation: the variogram is a two-point statistic, blind to higher-order structure, connectivity, and multimodality — features that motivate indicator methods (Part 8) and multipoint statistics (Part 9).

The rest of Part 3 builds on this foundation. §3.2 takes up the empirical estimator from the $h$ -scatterplot and the lag-binning rules. §3.3 lifts the isotropy assumption with directional variograms. §3.4 develops anisotropy (range and sill depending on direction). §3.5 introduces the indicator variogram, which transforms continuous data into a categorical indicator before computing $\gamma$ — the bridge to indicator kriging in Part 8. §3.6 introduces robust variogram estimators (Cressie–Hawkins and others), bringing the §1.4 robust-statistics ideas into the variogram domain. By the end of Part 3 you will have a defensible experimental variogram from any reasonable dataset. Part 4 then fits a permissible model to it. Part 5 plugs the model into the kriging system. Parts 6 through 10 elaborate the consequences.

The slow build pays off. The variogram defined here is what unlocks every product in the rest of the book.

References

Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8), 1246–1266. (The foundational paper. Defines the regionalised-variable framework, second-order and intrinsic stationarity, and the variogram. The 2γ-vs-γ naming convention originates here.)
Matheron, G. (1971). The Theory of Regionalized Variables and Its Applications. Les Cahiers du Centre de Morphologie Mathématique, No. 5. Fontainebleau, France: École Nationale Supérieure des Mines de Paris. (The monograph that formalised the framework set out in Matheron 1963. The canonical reference for the strict mathematical foundations of the variogram and the intrinsic-vs-second-order-stationarity distinction.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (Chapter 2, "Geostatistical Data" — the modern mathematical-statistics reference for the second-order vs intrinsic distinction and the algebraic equivalence of the three forms. The definitive treatment of when each assumption is appropriate.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (Chapter 2, "Basic Geostatistics" — the modern comprehensive reference. The clearest practitioner-oriented treatment of the three forms, the variogram's theoretical advantages, and the connection to kriging.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (Chapter 2 on spatial random functions and the variogram. The textbook this section's organisation most closely follows, especially the practical framing of "variogram as kriging input".)
Journel, A.G., Huijbregts, C.J. (1978). Mining Geostatistics. Academic Press. (The classic mining-geostatistics textbook. The "practical range" convention for Exponential and Gaussian models — the $0.95$ -of-sill rule that this section's widgets use — originates from the Fontainebleau tradition this book represents.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 7 on spatial continuity and the variogram. The most readable introductory treatment; the §3.1 widgets are designed to support an Isaaks–Srivastava-style first-pass reading.)
Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications (3rd ed.). Springer. (Chapter 2 on the variogram. Strong on the algebraic equivalence and the conditional-negative-definiteness theory; useful complement to the Goovaerts treatment.)