Indicator variograms

Part 3 — Experimental variograms

Learning objectives

Define the indicator transform: for a continuous attribute $Z(\mathbf{s})$ and a chosen cutoff $z_c$ , the indicator $I(\mathbf{s}; z_c) = \mathbf{1}\{Z(\mathbf{s}) \le z_c\}$ is a 0/1 Bernoulli random variable with $p(z_c) = P(Z \le z_c) = F_Z(z_c)$ ; state $\mathbb{E}[I] = p(z_c)$ and $\mathrm{Var}(I) = p(z_c)(1 - p(z_c))$
Write the indicator (semi)variogram at lag $\mathbf{h}$ as $\gamma_I(\mathbf{h}; z_c) = (1/2)\,\mathrm{Var}\!\left(I(\mathbf{s}+\mathbf{h}; z_c) - I(\mathbf{s}; z_c)\right)$ , its empirical estimator $\hat\gamma_I(h; z_c) = (1/(2 N(h)))\sum [I(\mathbf{s}_i + \mathbf{h}; z_c) - I(\mathbf{s}_i; z_c)]^2$ , and recognise it as the ordinary Matheron 1965 estimator applied to the 0/1 transformed data
State the indicator sill $\gamma_I(\infty; z_c) = p(z_c)(1 - p(z_c))$ as the Bernoulli variance and show that it is maximised at the median cutoff $p = 0.5 \Rightarrow$ sill = 0.25, drops to 0 as $p \to 0$ or $p \to 1$
Explain WHY indicator variograms are useful: (i) ROBUSTNESS — indicators are bounded in [0, 1], so a single wild outlier moves $\hat\gamma_I$ by at most 1/(2N(h)) per pair; (ii) NON-GAUSSIAN STRUCTURE — high-value zones and low-value zones can have different connectivity, and $\gamma_I$ at different cutoffs reveals this where the continuous $\gamma$ averages it away; (iii) FOUNDATION OF SISIM (§8) — Sequential Indicator Simulation simulates each cutoff's indicator field separately and reassembles
Run the standard multi-cutoff workflow: choose ~5-10 cutoffs at percentiles of $Z$ (typically $\{0.10, 0.25, 0.50, 0.75, 0.90\}$ ), compute $\hat\gamma_I(h; z_c^{(k)})$ for each, fit a permissible variogram model to each in Part 4, and use the full suite to reconstruct the conditional CDF $F(z \mid \mathbf{h})$ — the input that SISIM in §8 consumes
Interpret the role of the MEDIAN cutoff $z_c = z_{0.5}$ as the default diagnostic: maximum Bernoulli variance gives maximum signal-to-noise for $\hat\gamma_I$ , the indicator field is balanced 50/50, and this is the cutoff at which 'is there spatial structure here?' is easiest to answer
Interpret EXTREME cutoffs (very small or very large $z_c$ ): $p(1-p)$ shrinks toward 0, the indicator field is mostly all-0 or all-1, $\hat\gamma_I$ becomes noisy because few pairs straddle the cutoff. High-end cutoffs answer 'is the rare tail connected or scattered?' but require many more samples than the median; low-end cutoffs do the symmetric thing
Recognise the ORDER-RELATIONS issue: indicator variogram estimates at different cutoffs are not algebraically guaranteed to be consistent with a valid joint distribution of $(Z(\mathbf{s}), Z(\mathbf{s}+\mathbf{h}))$ . Practical software (GSLIB `sisim`, SGeMS) post-processes the per-cutoff $F(z \mid \mathbf{h})$ estimates with order-relations CORRECTIONS to enforce monotonicity in $z$ and the bounds $F \in [0, 1]$
Extend the framework to categorical (facies) data: with $K$ mutually exclusive categories, define one indicator per category $I_k(\mathbf{s}) = \mathbf{1}\{C(\mathbf{s}) = k\}$ , use $K - 1$ as the independent set when the categories are exhaustive, and use the per-category indicator variograms exactly as in the continuous case — the foundation of facies-model indicator kriging in §8 and the gateway to multipoint statistics in §9

§3.1 through §3.4 built the experimental variogram of a CONTINUOUS attribute $Z(\mathbf{s})$ : the binning, the directional sweep, the anisotropy ellipse, the geometric / zonal distinction. Every estimator in those sections worked on the raw measured values directly, and the resulting $\hat\gamma(h)$ described how the squared increments $(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s}))^2$ behave as a function of separation. That is the right object for Gaussian-like attributes whose extreme values matter less than their average behaviour. It is the wrong object — or at least an incomplete object — for attributes whose CONNECTIVITY of high or low values matters. Reservoirs are predictable as a whole but production hinges on the connectivity of the high-permeability layers. Ore deposits are estimated globally but mining economics depend on the connectivity of the high-grade lenses. Contamination plumes are mapped statistically but cleanup decisions depend on whether the high-concentration cells form a connected channel or scattered hotspots. The two-point variogram of the continuous values averages over those connectivity differences and tells you only the mean squared increment at each lag. The INDICATOR VARIOGRAM, the subject of §3.5, recovers the connectivity information by transforming the continuous attribute into a 0/1 indicator at each chosen cutoff and computing the variogram of the indicator.

The idea is two lines of algebra. Pick a cutoff $z_c$ . Define $I(\mathbf{s}; z_c) = 1$ if $Z(\mathbf{s}) \le z_c$ , else $0$ . Compute the ordinary Matheron 1965 empirical variogram on the resulting 0/1 field. That is it. The indicator variogram inherits every piece of machinery you have already met — lag binning, directional sweep, anisotropy fits, robust estimators — but applied to a transformed dataset whose only possible values are 0 and 1. The Bernoulli-variance algebra makes the sill identical to $p(z_c)(1 - p(z_c))$ , automatically bounded above by 0.25, and the indicator at the median cutoff is the canonical "is there any spatial structure here?" diagnostic that practitioners reach for first. Stack a handful of cutoffs (say $p \in {0.1, 0.25, 0.5, 0.75, 0.9}$ ) and you have a SUITE of variograms — each cutoff a different view of the same field — which together can reconstruct the conditional CDF $F(z \mid \mathbf{h})$ at any lag. That conditional CDF is what the §8 Sequential Indicator Simulation (SISIM) algorithm consumes. §3.5 is the foundational section for every indicator-based method that follows.

The indicator transform

For a continuous attribute $Z(\mathbf{s})$ and a chosen numerical cutoff $z_c \in \mathbb{R}$ , the indicator transform at $z_c$ is the binary random variable

I(\mathbf{s}; z_c) \;=\; \mathbf{1}\{Z(\mathbf{s}) \le z_c\} \;=\; \begin{cases} 1 & \text{if } Z(\mathbf{s}) \le z_c, \\ 0 & \text{otherwise.} \end{cases}

Two facts follow immediately. First, $I(\mathbf{s}; z_c)$ is a Bernoulli random variable with success probability $p(z_c) = P(Z(\mathbf{s}) \le z_c) = F_Z(z_c)$ — the CDF of $Z$ evaluated at the cutoff. Under second-order stationarity this $p(z_c)$ is constant across the domain (the same fraction of locations falls below $z_c$ everywhere). Second, the variance of $I$ is the Bernoulli formula

\mathrm{Var}(I(\mathbf{s}; z_c)) \;=\; p(z_c)\,(1 - p(z_c)).

This last expression is maximised at $p = 0.5$ — i.e. when $z_c$ is the MEDIAN of $Z$ — and equals 0.25 there. At the extremes $p \to 0$ or $p \to 1$ , the variance collapses to 0: a constant 0 (or constant 1) field has no variance to model. The Bernoulli variance is the SILL of the indicator variogram, and the cutoff that maximises it is the one with the cleanest signal-to-noise for empirical estimation. Practitioners call $z_c = z_{0.5}$ the median or "0.5 cutoff" and treat it as the default diagnostic cutoff for any new dataset (Goovaerts 1997 §7.3.1; Deutsch & Journel 1998 §IV.1).

The transform itself is information-LOSSY in one direction: knowing $I$ tells you only whether $Z$ is above or below $z_c$ , not the actual value. A pair $(0.42, 0.43)$ and a pair $(0.42, 0.99)$ both map to $(0, 1)$ under the cutoff $z_c = 0.5$ . So one indicator variogram on its own cannot recover $\gamma(h)$ — too much detail has been thrown away. The recovery happens when you stack indicator variograms across MANY cutoffs: as $z_c$ slides from a low quantile to a high quantile, the indicator changes its 0/1 pattern, and the collection of $\gamma_I(h; z_c)$ across all chosen cutoffs reconstructs the bivariate joint distribution $P(Z(\mathbf{s}) \le z_c^{(1)}, Z(\mathbf{s}+\mathbf{h}) \le z_c^{(2)})$ that any two-point method needs to know. This is the deep reason the multi-cutoff suite — not any single cutoff — is what feeds the §8 algorithms.

The indicator variogram and its empirical estimator

Once the data has been indicator-transformed, the rest of §3.1-§3.2 ports unchanged. The indicator variogram at lag $\mathbf{h}$ and cutoff $z_c$ is

\gamma_I(\mathbf{h}; z_c) \;=\; \tfrac{1}{2}\,\mathrm{Var}\!\bigl(I(\mathbf{s}+\mathbf{h}; z_c) - I(\mathbf{s}; z_c)\bigr).

It is the half-variance of the indicator increment, exactly mirroring the continuous case. Under second-order stationarity this is

\gamma_I(\mathbf{h}; z_c) \;=\; C_I(\mathbf{0}; z_c) - C_I(\mathbf{h}; z_c),

where $C_I(\mathbf{h}; z_c) = \mathrm{Cov}(I(\mathbf{s}; z_c), I(\mathbf{s}+\mathbf{h}; z_c))$ is the indicator covariance and $C_I(\mathbf{0}; z_c) = p(z_c)(1 - p(z_c))$ . As $|\mathbf{h}| \to \infty$ , $C_I(\mathbf{h}; z_c) \to 0$ (distant indicator values become uncorrelated) and $\gamma_I$ rises to its asymptotic sill

\gamma_I(\infty; z_c) \;=\; p(z_c)\,(1 - p(z_c)).

The Matheron 1965 estimator on a finite dataset of $n$ samples is

\hat\gamma_I(h; z_c) \;=\; \dfrac{1}{2\, N(h)} \sum_{(i,j): \,\|\mathbf{s}_i - \mathbf{s}_j\| \in \text{bin}(h)} \bigl[I(\mathbf{s}_i; z_c) - I(\mathbf{s}_j; z_c)\bigr]^2,

where $N(h)$ is the number of unordered pairs whose separation falls into the $h$ -bin. Because the squared increment $[I_i - I_j]^2$ is 1 if $I_i \ne I_j$ and 0 otherwise, $\hat\gamma_I(h; z_c)$ is just the FRACTION of pairs in the bin that straddle the cutoff, divided by 2. Equivalently, $2\hat\gamma_I(h; z_c)$ is the empirical probability that two samples separated by lag $h$ end up on opposite sides of $z_c$ . This re-reading is the cleanest one to keep in mind: the indicator variogram is the probability that the cutoff is "crossed" between two samples, scaled by the half. Two samples in the same indicator class contribute 0; two in different classes contribute 1. Sum, divide, done.

All the §3.2 machinery — bin width, max lag, half-domain rule, rule-of-30, pair-count diagnostics — applies to $\hat\gamma_I$ exactly as it did to $\hat\gamma$ . The §3.3 directional sweep, the §3.4 anisotropy ellipse fit, and the §3.6 robust estimator (coming next) all work without modification on indicator data. The indicator transform does not change the EXPERIMENTAL VARIOGRAPHY framework; it changes only the SCALAR FIELD on which the framework runs.

Indicator transform and the resulting variogram — one cutoff at a time

The first §3.5 widget. A 32×32 Gaussian field is generated with an isotropic Spherical covariance (range 0.30, sill 1.0). 160 samples are drawn on the unit square. The reader slides a CUTOFF PERCENTILE $p \in [5, 95]%$ , which sets $z_c$ at the corresponding sample quantile. The left panel shows the field with cells coloured by the indicator overlay: blue for cells below $z_c$ (the $I = 1$ class), amber for cells above (the $I = 0$ class). Sample dots inherit the same indicator colours. The right panel overlays TWO empirical variograms on the same lag axis: the continuous $\hat\gamma(h)$ in gold (left y-axis, sill ≈ 1) and the indicator $\hat\gamma_I(h; z_c)$ in blue (right y-axis, sill p(1-p)). The theoretical Bernoulli sill is drawn as a dashed reference line at $p(1-p)$ .

Three things to watch as the cutoff slider moves. First, the FIELD APPEARANCE: at very low $p$ the field is mostly amber (almost everything is above the cutoff); at very high $p$ it is mostly blue; at the median, a roughly balanced mosaic. Click the "Median cutoff" button to snap to $p = 0.5$ and see the maximum-contrast version. Second, the INDICATOR SILL: as $p$ slides from 0.05 to 0.50 to 0.95, the theoretical Bernoulli sill traces out the parabolic shape $p(1-p)$ — 0.05 at the extremes, 0.25 at the median. The blue dashed line on the right panel moves with the slider, and the observed indicator sill (the blue curve's plateau) tracks it within sampling noise. Third, the SHAPE: the indicator variogram and the continuous variogram both rise from 0 at $h = 0$ to their respective sills around $h \approx 0.30$ (the range of the underlying Spherical model). They have the same RANGE because the indicator structure inherits the spatial continuity of the underlying continuous field. Only the SILL is determined by the cutoff. This is the key qualitative insight: the indicator transform repackages the connectivity information without altering the underlying correlation length.

The diagnostic at the median cutoff is especially clean. $p(1-p) = 0.25$ , so the indicator variogram reaches its highest possible sill, and the empirical estimator has the largest pair-wise increment range to work with. If a dataset is going to show any spatial structure on the indicator scale, the median cutoff is where it shows up most clearly. Practitioners who do not have time to run a five-cutoff suite default to a single median-cutoff $\hat\gamma_I$ as their first-look diagnostic. Reaching the Bernoulli sill 0.25 cleanly is the headline confirmation that the spatial structure is real (Goovaerts 1997 §7.3.2; Isaaks & Srivastava 1989 §13.4).

At the extreme cutoffs the picture is qualitatively different. With $p = 0.10$ , only ~10% of samples fall below the cutoff; ~90% of pair-cross events happen between two amber samples (both above) and contribute 0 to the indicator variogram. The few non-zero increments come from the rare blue-amber pairs, and with finite $N$ those few pairs produce a noisy estimate. The Bernoulli sill 0.10 × 0.90 = 0.09 is small, so the entire indicator variogram lives near the baseline. Extreme cutoffs answer specific questions ("is the rare top decile connected or scattered?") but they require many more samples than the median to estimate reliably — a point the second widget reinforces visually.

Three reasons the indicator variogram earns its place

The indicator transform is not just a curiosity. It buys three concrete practical advantages over working with the continuous variogram alone.

Robustness to outliers. The continuous Matheron estimator squares the increment $(Z(\mathbf{s}_i) - Z(\mathbf{s}_j))^2$ . A single contaminated pair where one sample is, say, ten standard deviations off contributes 100 units of squared increment — enough on its own to shift $\hat\gamma(h)$ by an entire sill within one bin. The indicator increment, by contrast, is bounded in ${-1, 0, +1}$ ; its square is at most 1; one wild outlier contributes at most $1 / (2 N(h))$ to $\hat\gamma_I$ . The indicator variogram absorbs outliers automatically (Cressie & Hawkins 1980 contrast; Journel 1983 §3). When real data has bad samples — instrumental glitches, transcription errors, dilution / smearing in the lab — and you do not have time to clean them all, switching to indicator variography is a low-effort robust baseline. §3.6 develops the formal CRESSIE-HAWKINS robust estimator that takes a different route to the same robustness goal.
Sensitivity to non-Gaussian connectivity. A continuous variogram averages over all pairs equally: a pair $(0.42, 0.43)$ contributes the same squared increment 0.0001 regardless of whether both samples are in the high-grade ore zone or the low-grade halo. The indicator variogram at $z_c = z_{0.5}$ separates these cases: a pair where BOTH are below the median contributes 0 (same class); a pair where both are above contributes 0 (same class); a pair that straddles contributes 1. The cutoff acts as a CLASSIFIER on the underlying value space, and the indicator variogram measures the spatial structure of that classification. A field whose high values cluster in dense lenses and whose low values are scattered will have an indicator variogram at the high-cutoff with a LONGER range than the indicator variogram at the low-cutoff — even though the continuous variogram averages those two ranges together. Real ore deposits, real shale-versus-sand facies, real plume cores versus dilute halos all exhibit this CUTOFF-DEPENDENT connectivity (Journel & Isaaks 1984; Goovaerts 1997 §7.3.4).
The foundation of indicator kriging and SISIM. When you fit a permissible variogram model to $\hat\gamma_I(h; z_c)$ in §4, you can use it inside an ORDINARY KRIGING system to estimate $P(Z(\mathbf{s}^$ at any unsampled location — the conditional CDF, evaluated at the cutoff. Stack indicator-kriging estimates across many cutoffs and you have a full conditional CDF $F(z \mid \mathbf{s}^$ , \text{data}) $F (z ∣ s^{*}, data)$ at every location, without ever assuming Gaussianity. That is the §8.2 INDICATOR KRIGING workflow. The dynamic version, where you simulate one cutoff at a time and progressively condition on the previously simulated indicators, is SEQUENTIAL INDICATOR SIMULATION (SISIM, Journel & Isaaks 1984; Deutsch & Journel 1998 GSLIB sisim program). Every indicator-based method in the rest of this textbook runs on the variograms you compute in §3.5.

The multi-cutoff workflow and the conditional CDF

One indicator variogram is a slice of information. A FULL FAMILY of indicator variograms across a grid of cutoffs is the input that downstream algorithms actually consume. The standard practitioner workflow is:

Choose 5-10 cutoffs. Almost always at PERCENTILES of $Z$ rather than at evenly-spaced absolute values, because the percentile grid spends sample density where the data is — a high-cutoff $p = 0.95$ in absolute units might sit out beyond all sampled values for a heavy-tailed variable. The canonical default is ${0.10, 0.25, 0.50, 0.75, 0.90}$ at five cutoffs; some practitioners use deciles ${0.1, 0.2, \ldots, 0.9}$ at nine cutoffs. The choice is a trade-off between resolution of the reconstructed $F(z \mid \mathbf{h})$ and sample-size limits on each individual $\hat\gamma_I$ (Goovaerts 1997 §7.3.2; Pyrcz & Deutsch 2014 §3.4).
Compute $\hat\gamma_I(h; z_c^{(k)})$ for each cutoff. Same lag bins, same directional cones (if anisotropic), same pair set — only the indicator-transformed values change. The arithmetic is cheap: pre-compute the pair geometry once, then re-bin the indicator increments per cutoff. The §3.5 multi-cutoff widget below does exactly this.
Fit a permissible model to each $\hat\gamma_I$ . Part 4 of this textbook develops permissible variogram models (Spherical, Exponential, Gaussian, Matérn, nested combinations). Each indicator variogram gets its own fitted model with its own range, sill, and nugget. The sill is constrained to be near $p_k(1-p_k)$ at convergence — a sanity check that the fit is consistent with the Bernoulli structure.
At a prediction location $\mathbf{s}^*$ , krige each indicator separately. Indicator kriging using the per-cutoff fitted models gives an estimate $\hat F(z_c^{(k)} \mid \mathbf{s}^$ at every cutoff $k$ . The collection of these estimates forms a piecewise-constant or interpolated conditional CDF $\hat F(z \mid \mathbf{s}^*)$ at the prediction location. (Details in §8.2.)
Apply order-relations corrections. Because each cutoff is kriged independently, the raw estimates $\hat F(z_c^{(k)} \mid \mathbf{s}^*)$ are not algebraically guaranteed to be monotone in $k$ — a low-cutoff estimate might come out larger than a higher-cutoff estimate by sampling noise, and individual estimates can spill outside [0, 1]. Order-relations corrections post-process the $K$ -cutoff family to enforce monotonicity and the unit-interval bound. The standard fix (Deutsch & Journel 1998 §IV.1.7) sweeps once upward, then once downward, taking running maxima / minima as needed. The result is a valid conditional CDF that downstream simulation can use.
Simulate or summarise. From the corrected $\hat F(z \mid \mathbf{s}^*)$ , you can either draw realisations (Sequential Indicator Simulation in §8.3) or extract summary statistics (estimated mean, conditional median, probability of exceeding a threshold) for decision support.

The second §3.5 widget below shows what step 2 looks like in practice: five indicator variograms on a common lag axis, with a per-cutoff diagnostic table that reports the observed sill, the theoretical Bernoulli value, and a rough practical range. It also flags how well the observed sills agree with the theoretical $p(1-p)$ values — the most useful single-number sanity check for whether a multi-cutoff suite is internally consistent before it gets fed into Part 4 model fitting.

Three observations to make with the multi-cutoff widget. First, the FIVE CURVES have similar SHAPES — all rise from 0 and plateau around the same lag (the practical range), because they describe the same underlying spatial structure. The differences are in the SILLS: the median-cutoff curve sits highest at ~0.25, the 25 and 75 cutoffs sit around 0.19, and the 10 and 90 cutoffs sit near 0.09. The pattern traces out the Bernoulli $p(1-p)$ parabola. Second, click "New realisation" several times. The median-cutoff curve stays stable across draws; the extreme-cutoff curves jitter noticeably because they depend on fewer cross-cutoff pairs. Practical guidance: trust the median cutoff most, treat extreme cutoffs as approximate. Third, the per-cutoff table reports the OBSERVED sill alongside the THEORETICAL $p(1-p)$ . With 220 samples and a Spherical model, the agreement is usually within 5-15% — a defensible suite. If the worst discrepancy exceeds ~25% of the median sill, either the lag range is too short (the suite has not reached its asymptote within the displayed lags) or the cutoff is too extreme for the available pair counts. The widget reports the worst discrepancy explicitly and gives a one-sentence verdict.

Order-relations issues and their correction

The indicator variogram is straightforward; the joint distribution it implies is not. Recall that $\hat F(z \mid \mathbf{h})$ — the conditional CDF that the multi-cutoff workflow ultimately reconstructs — must satisfy three properties to be a valid distribution function: monotonicity in $z$ , the bounds $0 \le \hat F \le 1$ , and the limits $\hat F \to 0$ as $z \to -\infty$ and $\hat F \to 1$ as $z \to +\infty$ . A multi-cutoff suite that is fit and kriged INDEPENDENTLY per cutoff does not respect any of these properties automatically. Two cutoffs $z_c^{(1)} < z_c^{(2)}$ should produce estimates $\hat F(z_c^{(1)} \mid \mathbf{h}) \le \hat F(z_c^{(2)} \mid \mathbf{h})$ , but sampling noise in the per-cutoff variogram fits and the per-cutoff kriging weights can flip this in either direction. Individual $\hat F$ values can spill below 0 or above 1.

This is the ORDER-RELATIONS problem (Goovaerts 1997 §7.3.3; Deutsch & Journel 1998 §IV.1.7). It is real, it is unavoidable with the independent-cutoff approach, and it has a standard fix. After running indicator kriging at every cutoff at a location, sweep the resulting $\hat F$ values upward and clip them to a running maximum: this enforces monotonicity. Then sweep downward and clip to the running minimum (so the corrected values do not drift above subsequent estimates). Finally clamp the result to $[0, 1]$ . The corrected sequence is a valid CDF that can be sampled from in SISIM. The cost of the correction is modest and is incurred per prediction location; the benefit is a downstream simulation that produces realisations whose pointwise distributions match the data.

Alternatives to the independent-cutoff approach exist but are more complex. JOINT INDICATOR KRIGING couples the per-cutoff systems through cross-covariance terms (the indicator at $z_c^{(1)}$ at one location and the indicator at $z_c^{(2)}$ at another location are correlated); the joint system enforces order relations algebraically but requires fitting cross-variograms for every cutoff pair — $K(K-1)/2$ extra fits. PRACTICALLY most software (GSLIB sisim, SGeMS sisim) sticks with the independent-cutoff approach and applies order-relations corrections post-hoc. The choice is a quality-effort trade-off; for the textbook the take-away is to be aware the issue exists and to KNOW the standard correction.

The §3.5 widgets do not run the full kriging-and-correction pipeline (that is §8). They stop at step 2 of the workflow: compute $\hat\gamma_I$ at each cutoff, display the suite, and sanity-check the Bernoulli sills. The internal consistency check the second widget reports — how close each observed sill is to its theoretical $p(1-p)$ — is the multi-cutoff analogue of the half-domain pair-count check from §3.2: a fast diagnostic to catch a fundamentally broken suite before any downstream effort is spent on it.

Categorical (facies) indicators

The indicator framework extends naturally to categorical attributes. If $C(\mathbf{s})$ takes one of $K$ mutually exclusive categories ${c_1, c_2, \ldots, c_K}$ — sand, shale, conglomerate, limestone, dolomite for a reservoir facies model; iron, manganese, gangue for an ore-classification dataset; oak, pine, grass, water for a land-cover map — define one indicator per category:

I_k(\mathbf{s}) \;=\; \mathbf{1}\{C(\mathbf{s}) = c_k\}, \qquad k = 1, 2, \ldots, K.

Each $I_k$ is Bernoulli with success probability $p_k = P(C = c_k)$ . Its sill is $p_k(1 - p_k)$ . The same Matheron estimator $\hat\gamma_{I_k}(h)$ describes the spatial connectivity of the $k$ -th category. Because the $K$ categories are exhaustive (every location is in exactly one), the indicators sum to 1 at every point — $\sum_k I_k(\mathbf{s}) = 1$ — which means the $K$ indicator variograms are linearly dependent. In practice you keep $K - 1$ as the independent set and derive the $K$ -th by the sum constraint. (Equivalently, you can compute all $K$ and check the sum constraint as a consistency diagnostic.)

Categorical indicator variograms are the backbone of FACIES-MODEL INDICATOR KRIGING in §8.2 and the natural launching pad for MULTIPOINT STATISTICS in §9. The two-point indicator variogram captures the typical "size" and "anisotropy" of each facies (a fluvial channel has a long major-axis range; a thin shale drape has a short minor-axis range) but cannot capture the topological CONNECTIVITY of how channels link or how shales pinch out — that is exactly the gap multipoint statistics fills. §3.5 lays the two-point foundation; §9 develops the multipoint generalisation.

Practical caveats for real-data indicator variography

The indicator framework is robust and clean in theory, but four practical hazards trip up indicator-variogram workflows on real datasets.

Sample-size limits at extreme cutoffs. $\hat\gamma_I$ at $p = 0.10$ needs enough cross-cutoff pairs (pairs where one sample is below and one is above) to estimate stably. With $n = 100$ samples and a cutoff at the 10th percentile, only ~10 samples are below; cross-cutoff pairs number roughly $10 \times 90 = 900$ total, spread across many lag bins. Per-bin pair counts at the very low or very high cutoffs can drop below the rule-of-30 threshold. Practical fix: report per-bin pair counts alongside $\hat\gamma_I$ , dim or omit under-supported bins, and consider using fewer cutoffs (three or five rather than nine) for sample-poor datasets.
Order-relations failures get worse with more cutoffs. The risk of a downstream order-relations correction having a noticeable effect on the corrected $\hat F$ rises roughly with $K$ . Stick to five cutoffs for routine work; reserve nine cutoffs for datasets and applications where the extra resolution is genuinely worth the extra correction load.
The percentile grid moves with the data. If you compute indicator variograms on a training dataset and then apply the fitted models to a new dataset, the percentile cutoffs ${z_{0.10}, z_{0.25}, \ldots}$ may correspond to slightly different absolute values in the new context. For comparability across datasets, fix the cutoffs at known absolute values (e.g., a regulatory contamination threshold, a mineable-grade economic cutoff) rather than at percentiles. The Bernoulli-sill diagnostic then becomes a check that the OBSERVED proportion in the new dataset matches the expected one — a useful diagnostic but it shifts the workflow.
Order-relations corrections cannot fix a broken suite. If two indicator variograms at very close cutoffs have wildly different fitted ranges (because of sampling noise), the corrected $\hat F$ will still be valid but its conditional shape will be misshapen — overly bumpy or stair-step-like. The fix is upstream: smooth the per-cutoff fitted models (constrain the range parameters across cutoffs to vary smoothly with $p$ ), or use a joint indicator-kriging approach that fits all cutoffs simultaneously. This is mostly a §4 model-fitting issue, not a §3 estimation issue, but it is worth knowing about.

Try it

In the indicator-cutoff-explorer widget, click "Median cutoff (p = 0.5)". Read off the theoretical sill p(1−p) and the observed indicator sill from the readout pills. They should agree to within ~5-10% (sampling noise). Now slide the cutoff to p = 25%. The theoretical sill drops to 0.25 × 0.75 = 0.1875. Does the observed sill follow it down?
Same widget. Slide the cutoff to p = 10%. The indicator field is almost all amber (above cutoff) with a sparse blue overlay of the low-tail samples. The Bernoulli sill is 0.10 × 0.90 = 0.09 — small. Look at the blue indicator-variogram curve: it should sit very low across all lags, mostly below 0.10. The continuous gold curve is unchanged. Why? Because the continuous variogram does not depend on the cutoff; the cutoff only affects the indicator transform.
Same widget. Click "New realisation" several times in a row at p = 0.5. The continuous γ̂(h) and the indicator γ̂_I(h) both jitter. Which one jitters MORE in relative terms (compared to its own sill)? (Hint: the continuous estimator squares the increments, which weights the tails heavily and is sensitive to extreme samples; the indicator is bounded. The indicator estimate at the median is typically more stable than the continuous estimate at the same sample size.)
In the multi-cutoff-suite widget, find the median (p = 0.50) curve. What is its observed sill? What is its theoretical Bernoulli sill? Now compare to the 10% and 90% curves. The 10% and 90% cutoffs should have roughly EQUAL sills (both 0.09) by Bernoulli symmetry — confirm this from the table.
Multi-cutoff widget. Click "New realisation" 5 times. Read the "worst discrepancy" verdict in the message box. With 220 samples and a Spherical model, the suite should land in EXCELLENT or OK most of the time. Roughly what fraction of realisations triggers a WARNING? That fraction is your empirical confidence in this dataset / cutoff-grid combination.
Without coding: you analyse a regional gold-grade dataset with n = 800 samples in a vein-hosted deposit. You compute indicator variograms at five canonical cutoffs and find the 0.50-cutoff variogram has a much LONGER practical range than the 0.10-cutoff variogram. What does this tell you about the geology? (Hint: the high-grade indicator at the high cutoff is spatially MORE CONTINUOUS than the median indicator — high-grade lenses are large, contiguous structures. This is exactly the cutoff-dependent connectivity that motivates indicator methods over a single continuous variogram.)
Without coding: a colleague hands you indicator-kriging estimates at five cutoffs at a single location, and reports F(z|h) values of (0.18, 0.34, 0.51, 0.49, 0.85) for cutoffs at the 10, 25, 50, 75, 90 percentiles of Z. What problem do you see? What standard fix would you apply? (Hint: the 0.50 estimate 0.51 is HIGHER than the 0.75 estimate 0.49 — a monotonicity violation. Order-relations corrections: sweep upward taking running max, so the corrected 0.75 value becomes max(0.49, 0.51) = 0.51, then a downward sweep if any later values would have created violations.)
Without coding: in a binary facies model with two categories "sand" and "shale", the sand fraction is p = 0.30 in your stationarity domain. You compute the indicator variogram for "sand" and find a clean Spherical fit with range 80 m and sill 0.20. Sanity check: is this consistent? (Hint: sill should be p(1−p) = 0.30 × 0.70 = 0.21 — yes, observed sill 0.20 within rounding. The range 80 m is the typical "size" of a contiguous sand patch.) What additional information would the indicator variogram for "shale" give you? (Hint: shale is the complementary indicator, so its variogram is identical to sand's — only one indicator carries independent information for two exhaustive categories.)

Pause and reflect: the continuous variogram tells you about average squared increments; the indicator variogram tells you about the probability of crossing a cutoff. Two fields can have the same continuous γ(h) but completely different indicator variograms at the high-cutoff — one with rare-but-connected hotspots, one with rare-but-scattered hotspots. Which one would matter more for an oil-recovery decision? Which for a contamination-cleanup decision? When is the continuous variogram alone enough?

What you now know — and what §3.6 and Part 8 build on it

You can write the indicator transform $I(\mathbf{s}; z_c) = \mathbf{1}{Z(\mathbf{s}) \le z_c}$ at a chosen cutoff, the indicator variogram $\gamma_I(\mathbf{h}; z_c)$ , and its empirical Matheron estimator. You know the Bernoulli sill $p(z_c)(1 - p(z_c))$ is maximised at the median cutoff (sill = 0.25) and drops to 0 at extreme cutoffs, and you can read off the implications: median cutoff for diagnostics, full multi-cutoff suite for downstream simulation. You know the three reasons indicator variograms earn their place: robustness to outliers, sensitivity to cutoff-dependent connectivity that continuous variograms average away, and the foundational role for indicator kriging (§8.2) and SISIM (§8.3).

You can run the multi-cutoff workflow: choose five-or-so percentile cutoffs, compute $\hat\gamma_I$ at each on the same lag bins, sanity-check each observed sill against its theoretical $p(1-p)$ , and recognise that the per-cutoff fits feed into Part 4 model selection and §8 indicator kriging. You know the order-relations issue — that independent per-cutoff kriging does not algebraically guarantee a valid $\hat F(z \mid \mathbf{h})$ — and the standard upward-then-downward-sweep correction that the GSLIB sisim program applies post-hoc. You know how the framework extends to categorical (facies) attributes via one indicator per category and the $K-1$ independent-indicators constraint that ports unchanged into facies-model indicator kriging.

§3.6 takes a different angle on the outlier-robustness problem. Rather than transforming $Z$ into a bounded indicator, the Cressie-Hawkins 1980 robust estimator modifies the empirical variogram formula itself — taking $(|\Delta Z|^{1/2})^4$ rather than $(\Delta Z)^2$ — so that the squared-increment outliers are damped without losing the continuous-scale information. The two approaches answer different parts of the same question: indicator variography gives you bounded estimates plus connectivity information; the robust estimator keeps the continuous scale but de-weights extremes. Practitioners use both, sometimes in the same workflow.

Part 4 will fit permissible models to the per-cutoff $\hat\gamma_I$ exactly as it fits permissible models to the continuous $\hat\gamma$ . §8.2 will run ordinary kriging on each indicator separately to get the conditional CDF $\hat F(z \mid \mathbf{s}^*)$ ; §8.3 will turn that into Sequential Indicator Simulation; §9 will pivot to MULTIPOINT statistics, which uses the multi-cutoff suite as a stepping stone toward modelling categorical topology that two-point indicator variograms cannot capture. Every method that uses the words "indicator" or "facies" in this textbook downstream rests on the foundation built in §3.5.

References

Journel, A.G. (1983). Nonparametric estimation of spatial distributions. Mathematical Geology, 15(3), 445–468. (The foundational paper on indicator-based geostatistics. Introduces the indicator transform as the basis for distribution-free spatial estimation, derives the Bernoulli-sill expression for $\gamma_I$ , and sets out the per-cutoff kriging framework that this section previews and §8.2 develops fully.)
Journel, A.G., Isaaks, E.H. (1984). Conditional indicator simulation: Application to a Saskatchewan uranium deposit. Mathematical Geology, 16(7), 685–718. (The first published application of Sequential Indicator Simulation to a real ore-grade dataset. Demonstrates the multi-cutoff workflow and the order-relations corrections on a high-stakes industrial dataset; one of the formative case studies for the SISIM algorithm subsequently codified in GSLIB.)
Matheron, G. (1965). Les Variables Régionalisées et Leur Estimation. Masson, Paris. (The classical Matheron 1965 estimator $\hat\gamma(h)$ that the indicator variogram inherits unchanged. The half-squared-increment formula at the heart of every two-point variogram estimator in this textbook.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§7.3 "Indicator-based methods" is the most thorough practitioner-level treatment of the multi-cutoff workflow, percentile-grid choice, order-relations corrections, and the connection to conditional CDFs. The organisation of §3.5 closely follows Goovaerts §7.3.1-§7.3.3.)
Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide (2nd ed.). Oxford University Press. (§IV.1 documents the sisim program and its order-relations correction algorithm — the upward-then-downward sweep that enforces monotonicity and the [0, 1] bounds on the corrected CDF. The industry-standard implementation of multi-cutoff indicator simulation since the early 1990s.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 13 introduces the indicator transform and the median-cutoff variogram as a robust alternative to the continuous estimator. The most readable practitioner-level introduction to the ideas in §3.5.)
Chilès, J.-P., Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty (2nd ed.). Wiley. (§5.5 develops the mathematical-statistics foundation of indicator covariances and their relation to bivariate distributions of $(Z(\mathbf{s}), Z(\mathbf{s}+\mathbf{h}))$ . The formal reference for the joint-distribution reconstruction that §3.5 sketches.)
Cressie, N. (1993). Statistics for Spatial Data (revised ed.). Wiley. (§2.4.1 discusses the indicator approach in the broader context of non-Gaussian spatial-statistical methods. The bridge between the indicator framework of this section and the broader mathematical-statistics treatment of spatial random functions.)
Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling (2nd ed.). Oxford University Press. (§3.4 develops indicator methods for facies modelling in reservoir-engineering settings. The connection between the indicator variogram of §3.5 and the facies-classified reservoir models that §8.2 builds for production-forecasting workflows.)