Comparing methods on the same dataset

Part 2 — Declustering

Learning objectives

State the four-scenario comparison framework: methods AGREE on (a) easy uniform-with-clustering and (b) canonical preferential-near-high; methods DISAGREE on (c) irregular-boundary cases where cell uses the bounding box but polygonal clips to the geological domain; both methods LOSE POWER on (d) very-sparse-n cases where the choice between them matters less than acquiring more samples
Choose between cell and polygonal declustering by the three dominant signals — boundary geometry (regular → cell, irregular → polygonal), sampling regularity (regular grid + localised clusters → cell, irregular clusters + gaps → polygonal), and software availability (Voronoi clipper available → polygonal viable, GSLIB-only → cell default)
Use three quantitative comparison metrics: (a) MSE of the declustered mean against a known truth on synthetic data, (b) declustered-mean stability across resampling / bootstrap, (c) downstream variogram-fit quality (preview of §2.4) — and recognise that no single metric crowns a universal winner because the methods are HEURISTICS
Apply Bourgault (1997) multi-cell-size averaging as the robust cell-declustering choice when the cell-size curve is noisy or ambiguous, and recognise that combining cell + polygonal (e.g. average their declustered means, or accept the more conservative of the two) is a defensible 'committee' strategy when neither method dominates
Recognise the structural failure modes that the side-by-side comparison exposes: cell declustering misuses bounding-box weights on L-shaped domains; polygonal declustering inflates edge weights without explicit domain clipping; cell-size curves can have multiple local minima at moderate n; polygonal weights are dominated by a few edge polygons at small n
Write the decision rule for production: cell declustering is the GSLIB default for clean rectangular domains with regular boundaries; polygonal declustering is the principled choice for irregular geological domains where a Voronoi clipper is available; in ambiguous cases RUN BOTH, report the agreement, and use any divergence as a QC signal — never report just one number without a cross-check
Position §2.3 honestly: declustering is a HEURISTIC family, not a population-model-derived correction; ONE number won't be better than another in ALL cases; the §2.3 deliverable is a defensible reporting standard ('cell-declustered mean of X with polygonal cross-check of Y; methods agree within Z%') and a decision rule, NOT a universal winner

§2.1 built cell declustering: discretise the domain into a regular grid, weight each sample by $w_i = 1/(n_c \cdot N_{\text{occ}})$ , sweep cell sizes, average over grid origins, pick an optimum by an explicit criterion. §2.2 built the polygonal alternative: each sample's weight is the area of its Voronoi polygon clipped to the domain, with no tuning parameter and origin invariance. Both methods are valid declustering tools, both are HEURISTIC rather than derived from a population model, and both ship in modern geostatistical practice. The natural question — and the one practitioners ask first — is "how do I CHOOSE between them on my actual dataset?" §2.3 answers it.

The structural argument has three pieces. First, on EASY datasets (regular sampling, regular boundary, moderate-to-large $n$ ) the two methods AGREE — they give declustered means within 1–3% of each other and both cut most of the raw bias. The choice is mostly a software-availability question (GSLIB DECLUS for cell, scipy.spatial.Voronoi + Shapely for polygonal). Second, on HARD datasets — irregular boundaries, tightly-spaced cluster geometries, very small samples — the methods DIVERGE, sometimes substantially. The §2.3 widget makes the divergence visible; the §2.3 prose explains why it happens and what to do. Third, declustering is one tool among many. When the methods diverge by more than a few percent on your dataset, the responsible reporting move is to run BOTH and disclose the spread, not to pick the answer that flatters the project. §2.4 takes this story further: when does the choice between cell and polygonal CASCADE into the kriged map and the simulation reference distribution? Sometimes a lot, sometimes barely at all.

When the two methods agree

The textbook empirical finding (Isaaks & Srivastava 1989, Chapter 10; Goovaerts 1997, §4.1; Olea 2009 on the Walker Lake comparison) is that for typical preferential-sampling datasets in clean rectangular domains, cell and polygonal declustering produce declustered means within a few percent of each other, both cutting most of the raw sampling bias. The agreement is not algebraic — the formulas $w_i = 1/(n_c \cdot N_{\text{occ}})$ and $w_i = A(V_i) / A(D)$ are entirely different — but it is empirical, robust, and easy to confirm on any synthetic field where the truth is known.

The pattern has a clean interpretation. Both methods are attempting to approximate the Horvitz–Thompson inverse-inclusion-probability weight $w_i \propto 1/\pi(\mathbf{s}_i)$ — the theoretically optimal correction under known sampling probabilities — using LOCAL SAMPLE DENSITY as a proxy for $\pi$ . Cell declustering estimates the density by counting samples per box; polygonal declustering estimates it by polygon area (small polygon = high local density). When the underlying density varies smoothly and the domain boundary is well-behaved, both proxies recover similar density estimates, so both produce similar weights. The differences are second-order: cell weights are piecewise-constant on the cell grid (a sample in a 10-sample cell gets exactly 1/10 of that cell's contribution regardless of where it sits inside the cell); polygonal weights vary continuously with position. On smooth-density datasets that difference is in the noise.

The §2.3 widget below makes this directly visible — pick scenario (a) "uniform-with-clustering" or (b) "preferential-near-high" and watch the cell and polygonal declustered means land within 1–2% of each other on most resamples. Then pick scenario (c) "irregular-boundary" or (d) "very-sparse-n" and watch them DIVERGE. The teaching arc is built into the scenario selector.

The side-by-side comparison: four scenarios

The first widget puts both methods on the SAME sample set across four scenarios — easy, canonical, irregular boundary, and very sparse — with the MSE-vs-truth averaged over five resamples so the verdict is stable rather than seed-dependent. Cell declustering runs the full GSLIB DECLUS sweep (4–30 cells per side, 5 random origins per size, MIN criterion). Polygonal declustering uses a raster Voronoi nearest-neighbour scan clipped to the actual domain mask (so scenario (c)'s L-shape is handled correctly on the polygonal side; the cell side intentionally uses the bounding box — the textbook failure mode).

What each scenario teaches:

(a) Uniform-with-clustering. About 70% of samples are placed uniformly across the unit square; the remaining 30% form a tight cluster around the highest peak. Both methods cleanly detect and correct the cluster; MSE values are tiny and roughly equal. The verdict line reads "methods AGREE". This is the case where the choice between cell and polygonal is a non-issue, and you report by software convention (GSLIB DECLUS for most production work).
(b) Preferential-near-high. The canonical §2.1 / §2.2 case: cubic acceptance probability on the normalised field value, so high-zone pixels are heavily oversampled. The raw bias is +15 to +25%; both methods cut it to within 1–3% of truth. Cell and polygonal declustered means typically agree to within 1–2 percentage points on each individual realisation, and the MSE ratio averaged over 5 resamples is usually 0.7×–1.4× — neither method dominates. This is the default reporting case: cell is the GSLIB default, polygonal is the cross-check.
(c) Irregular-boundary (L-shape). The domain is the unit square with the upper-right quadrant $[0.55, 1] \times [0.55, 1]$ removed. Preferential sampling pushes points toward the L's inner corner. The cell method, ALWAYS run with the full bounding box (the modal production mistake), now allocates weights for a domain that includes a missing quadrant — its declustered mean is biased because it inflates the weights of samples near the L's missing-corner boundary. Polygonal declustering, clipped to the actual L, gives a cleaner answer. MSE ratios of 2–5× favouring polygonal are typical here, and the divergence between cell and polygonal declustered means is the visible signature of the irregular-boundary problem. The fix is the same one §2.2 prescribed: supply the geological-domain polygon as the clipping mask and (for cell declustering) post-process the cell grid to drop cells outside the lease.
(d) Very-sparse-n. Only 25 samples, uniformly distributed (no preferential bias). The challenge is purely statistical power. Both methods produce noisy declustered means; both essentially approach the raw sample mean. Cell weights collapse toward $1/n$ because most cells become singletons (per §2.1's small-cell-size limit). Polygonal weights are dominated by 2–3 edge samples whose polygons stretch across large empty regions. MSE values are large for BOTH methods; the ratio between them flips seed-by-seed. The verdict line: report the small- $n$ caveat clearly; method choice is secondary to acquiring more samples or a geologically-justified prior.

The four-scenario sweep is the §2.3 teaching arc compressed into one interactive comparison. Pick a scenario, resample a few times, and watch the verdict line interpret each draw. The pattern that emerges across scenarios — easy cases agree, hard cases diverge, sparse cases are noisy — is what informed practitioners report verbally; the widget puts numbers on it.

Why methods diverge: the three structural drivers

The four scenarios cover the canonical failure modes, but it is worth naming the THREE structural drivers of divergence explicitly so the reader can recognise them on a real dataset.

Domain-boundary geometry. Polygonal declustering is structurally sensitive to where the domain boundary lives (§2.2's edge-effect problem); cell declustering is structurally sensitive to whether the cell grid covers the actual domain (the production-workflow tendency to use the bounding box). On regular rectangular domains both methods handle the boundary correctly with minimal effort. On irregular domains — L-shapes, lease polygons, geological pinch-outs, coastlines — they diverge unless explicitly told about the domain. Polygonal with a supplied clipping polygon is the principled choice; cell declustering can match it if the cell grid is post-processed to drop cells outside the geological domain, but that post-processing is a manual step that most GSLIB DECLUS workflows skip.
Cluster-vs-gap geometry. Cell declustering bundles samples by spatial location — every sample inside a cell is treated as one cell's worth of evidence. When samples are arranged in tight clusters separated by large empty gaps, the empty cells contribute nothing and the cell-size knob can fight you: a cell size large enough to bundle the clusters together gives essentially-zero weight to between-cluster regions; a cell size small enough to resolve between-cluster sparsity puts each cluster sample in its own cell and applies no correction. Polygonal declustering adapts to the cluster geometry without a tuning knob — the Voronoi polygons are small in the clusters and large between them automatically. For very irregular sampling, polygonal is more principled.
Sample-size regime. Both methods are most reliable at $n \geq 100$ . Between 50 and 100 they work but are sensitive to realisation-level noise. Below 50 both methods lose statistical power: cell weights collapse toward $1/n$ ; polygonal weights are dominated by a handful of edge polygons. The §2.3 widget's scenario (d) demonstrates this directly. In the small- $n$ regime, the choice between cell and polygonal matters less than additional samples, better geological priors, or formal Bayesian methods that propagate the small- $n$ uncertainty into the report.

A fourth driver is more practical than structural: the AVAILABLE SOFTWARE. Cell declustering ships with GSLIB DECLUS out of the box; the implementation is a few hundred lines of Fortran and runs in well under a second on thousands of samples. Polygonal declustering requires a Voronoi library (scipy.spatial.Voronoi + Shapely, R deldir with the rw boundary argument, CGAL in C++, MATLAB's built-in voronoi() with manual clipping). For a team whose toolchain is GSLIB-only, polygonal declustering is implementable but pays a 1-2 day tooling cost the first time. This software gate is the reason cell declustering remains the default in many mining and petroleum workflows even when polygonal is more principled on the geometry.

Three comparison metrics

If the methods are heuristics and neither dominates on every dataset, how do we decide which one is "better" on OUR dataset? Three metrics are in common use, in order of practical importance:

MSE against a known truth (on synthetic data). If you have a synthetic problem where the true population mean is known — a Walker Lake exhaustive dataset, a benchmark from Isaaks & Srivastava (1989) Chapter 10, a held-out validation subset of a densely-sampled deposit — compute $\mathrm{MSE} = \mathbb{E}[(\bar{Z}_d - \mu)^2]$ across many resamples of the preferential-sampling design, separately for each method. The method with lower MSE wins on that dataset. The §2.3 widget runs this metric live, averaged over 5 resamples, for each of the four scenarios. The textbook result (Isaaks & Srivastava 1989, Walker Lake; Goovaerts 1997; Olea 2009) is that for typical preferential-sampling designs on regular domains, the two methods have similar MSE; for irregular domains polygonal wins; for sparse samples both have large MSE. The §2.3 widget reproduces this story.
Declustered-mean stability across resampling. Even without ground truth, you can measure how much the declustered mean MOVES when you bootstrap-resample the sampling design (or, if the design is fixed, when you bootstrap-resample the values from the original sample). A stable declustered mean across resamples is a sign that the declustering correction is well-conditioned; an unstable one signals that the method is at the edge of its statistical-power regime. For cell declustering, the Bourgault (1997) multi-cell-size average is a stability-enhancing modification that addresses exactly this concern.
Downstream variogram-fit quality. Declustering weights feed into the experimental-variogram pair counts under robust estimators (Part 3.6 covers this). A "better" declustering is one whose downstream variogram is closer to the variogram fitted on the (unobserved) true population — or, in the absence of ground truth, one whose variogram fits a permissible model (spherical, exponential, Gaussian — Part 4.1) more cleanly. §2.4 walks this downstream chain quantitatively. For §2.3 the relevant point is that two methods which give similar declustered MEANS can produce visibly different VARIOGRAMS, especially in scenarios (c) and (d).

None of these metrics is a sufficient on its own. MSE depends on having a truth you do not always have; stability depends on a bootstrap or resampling protocol that introduces its own choices; downstream variogram quality is a tail-wags-the-dog argument unless you have a permissible-model fit to start from. The honest position is to compute as many of the three as the dataset supports, report the spread, and pick a primary method based on the structural drivers in the previous subsection rather than letting any single metric drive the choice.

The §2.3 decision rule for production

Putting the structural drivers and the comparison metrics together, here is the explicit rule the §2.3 recommender widget below implements — and the rule we recommend you apply on a real dataset:

If the boundary is REGULAR (rectangular / box) and sampling is roughly REGULAR with localised clusters: use CELL declustering. This is the GSLIB default for good reason. Run the full DECLUS sweep with multi-origin averaging (5 origins, 10–20 cell sizes), pick the optimum by the criterion matching your sampling design (MIN for high-grade preferential, MAX for low-grade, closest-to-target if a defensible prior exists). Report the declustered mean alongside the chosen cell size and the criterion. Use Bourgault (1997) multi-cell-size averaging as the robustness fallback if the cell-size curve is noisy.
If the boundary is IRREGULAR (L-shape, lease polygon, geological pinch-out) OR sampling is highly IRREGULAR (large clusters with empty gaps): use POLYGONAL declustering. Supply the geological-domain polygon as the clipping mask. Use scipy.spatial.Voronoi + Shapely (Python), R deldir with the rw boundary argument (R), or CGAL (C++). Verify the clipped polygon areas sum to $A(D)$ as a unit test that the clipping is correct. Report the declustered mean and the polygon-area distribution (max-area to min-area ratio is a useful diagnostic — values above ~30:1 signal a few edge samples are dominating).
If your toolchain is GSLIB-only and the boundary is irregular: use CELL declustering with the cell grid POST-PROCESSED to drop cells outside the geological domain. This is a hybrid: the cell-size sweep proceeds as normal, but cells whose centres lie outside the lease polygon are excluded from $N_{\text{occ}}$ and from the weight computation. The result is a cell-declustered mean that respects the geological boundary without requiring a Voronoi clipper. Implementations: post-process the GSLIB DECLUS output to apply a domain mask before computing $\sum w_i Z_i$ .
If $n < 50$ or the two methods' declustered means diverge by more than a few percent: RUN BOTH and report the spread. The honest move is to disclose that two equally-defensible declustering methods give different answers, and to use that disclosure as a QC signal for the next stage of the workflow (variogram, kriging, simulation). If one method's answer aligns more closely with an independent prior, cite the prior; otherwise present the spread and let the downstream decision-maker weight them.
If you have a prior on the bias direction (an independent line of evidence about over- or under-estimation): use CELL declustering with the "closest-to-target" criterion. This is the most defensible cell-size choice when a target is available, because the target is independent of the data being declustered. Polygonal declustering has no comparable mechanism (no tuning parameter to optimise against the prior).

The "Bourgault multi-cell-size average" deserves a special note. Bourgault (1997) proposed averaging the declustering weights across a RANGE of cell sizes rather than picking a single optimum. The resulting weights are still a probability mass on the sample, and the resulting declustered mean is much more stable than any single-cell-size estimate. In §2.3's decision-rule terms, the Bourgault average IS the practical hybrid that often works best: it inherits cell declustering's software availability (GSLIB DECLUS computes per-cell-size weights as a side product), it removes the cell-size optimisation step (no min / max / target criterion required), and it produces stable declustered means even when the cell-size curve is ambiguous. For an analyst who wants a single robust answer and is willing to defend "I averaged across cell sizes per Bourgault (1997)" in the methods section, Bourgault is the most defensible single cell-declustering report.

The decision-tree recommender

The second widget takes the §2.3 rule and turns it into a five-question interactive workflow. The reader answers: (Q1) is sampling regular or irregular? (Q2) is the boundary regular or irregular? (Q3) is the sample size large / moderate / small? (Q4) is a Voronoi clipper available? (Q5) is there an a-priori prior on the bias direction? The widget then names a primary method, a cross-check method, and any caveats — with the reasoning panel showing which question contributed which signal so the reader can audit the call.

The widget intentionally exposes every contribution to the score so a reader can disagree with a default weighting and override it on their own project. The structural-driver weights (Q1 = ±2, Q2 = ±2 to ±3, Q4 = −4 for "no Voronoi clipper", Q5 = +1 for "a-priori prior") reflect §2.3's qualitative argument: boundary geometry is the strongest signal, software availability is a hard gate, and the prior nudges toward cell declustering only when it exists. Reset the answers and try several profiles. Notice how a clean rectangular domain with regular sampling and no Voronoi library always recommends cell; an L-shape with a Voronoi library always recommends polygonal; the small- $n$ caveat appears regardless. The widget is a teaching tool, not a production substitute — the reasoning is what matters, not the badge.

Combining methods: the "committee" approach

When the two methods are within a percent or two of each other (the typical case for clean rectangular domains), reporting just one number is defensible. When they diverge by more than a few percent (irregular boundaries, sparse samples, ambiguous cluster geometries), several "committee" strategies exist that combine the two:

Average the two declustered means. $\bar{Z}_d^{\text{combined}} = \tfrac{1}{2}(\bar{Z}_d^{\text{cell}} + \bar{Z}_d^{\text{poly}})$ . The simplest committee. Defensible when both methods are individually credible and you do not want to choose. Reports the spread alongside.
Take the more conservative of the two. For high-grade preferential sampling, the LARGER of the two declustered means is more conservative (less aggressive in reducing the unweighted mean). For low-grade preferential, the SMALLER is more conservative. Reporting the conservative value alongside the spread is appropriate for feasibility studies where the consequence of over-estimating recoverable ore is severe.
Use the cell method with a polygonal cross-check, or vice-versa, with a "methods agree within X%" disclosure. The reporting standard most practitioners actually use: a primary number, a cross-check number, and a one-line statement of agreement. The §2.3 widget's output is exactly this format.
Use Bourgault's multi-cell-size average as the primary, polygonal as the cross-check. Bourgault is robust against the cell-size choice; polygonal is robust against the cell-origin choice. Together they cover both of cell declustering's structural-sensitivity dimensions and form a defensible pair for routine reporting.

The committee approach is consistent with broader geostatistical practice. Pyrcz, Strebelle & Deutsch (2005) make the same argument in the multiple-point-statistics (MPS) literature: when two methods disagree, the responsible move is to report both and use the divergence as information about model uncertainty rather than hiding it. The same logic applies one level lower in the workflow, at the declustering stage.

Why the declustering choice matters for §2.4

§2.4 picks up the downstream chain explicitly: how does the choice between cell and polygonal declustering propagate into the variogram, the N-score transform, the kriged map, and the SGS reference distribution? The preview answer is: usually not much, sometimes a lot. For scenarios (a) and (b) — easy and canonical — both methods produce essentially the same variogram, the same N-score transform, the same kriged map. For scenarios (c) and (d) — irregular boundary, sparse sample — the two methods produce noticeably different reference distributions, which cascade into different kriging means and different SGS realisations. The §2.4 widgets will compare both routes side-by-side and quantify the cascade.

The §2.3 takeaway for the §2.4 reader is this: the declustering choice is NOT a one-step "pick a method and forget it" decision. It is the first link in a chain, and on hard datasets the chain amplifies the choice. Reporting both methods' declustered means and propagating both through Part 5 (kriging) and Part 7 (SGS) — the "committee" approach extended downstream — is the most defensible position when the §2.3 widget shows divergence. Part 6 (cross-validation and QC) provides the diagnostics that confirm whether the chain is consistent.

Try it

In the methods-side-by-side widget, set the scenario to "(a) uniform-with-clustering". Run "Resample" 5 times. On how many seeds do cell and polygonal agree to within 1%? On how many do they agree to within 2%? Confirm the §2.3 thesis: easy cases AGREE. Now switch to "(c) irregular-boundary" and repeat. The MSE ratio between methods is now usually 2× or more — the L-shape is genuinely harder for cell.
In scenario "(c) irregular-boundary", read the cell-declustered mean and the polygonal-declustered mean against the true population mean (shown in the stats row). Which method is closer to truth? Now click "Resample" several times. Is the same method ALWAYS closer? When polygonal wins, by how much? When cell wins (occasionally — the structural argument is statistical, not deterministic), by how much? The pattern across resamples is the case for polygonal on irregular boundaries.
In scenario "(d) very-sparse-n", read MSE_cell and MSE_poly from the computational-details panel. Both should be large — neither method has much power at n = 25. Click "Resample" and observe how the verdict-line winner flips between cell and poly. The §2.3 lesson: at sparse n, neither method is reliable, and the comparison is a coin flip.
In the decluster-recommender widget, set: Q1 = regular sampling, Q2 = regular boundary, Q3 = large n, Q4 = no Voronoi clipper, Q5 = no a-priori. Read the recommendation. Now flip Q2 to irregular boundary. How does the recommendation change? Now also flip Q4 to yes (Voronoi clipper available). Now flip Q1 to irregular sampling. Each flip moves the recommendation in a predictable direction; the reasoning panel shows which question contributed which signal. The widget is auditable.
In the recommender, set Q3 = small (n < 50) and answer the other four questions arbitrarily. Note the small-n caveat appears regardless of the other answers. The §2.3 lesson: at small n, method choice is secondary; declarative reporting of the sample-size caveat is the most important thing the analyst can do.
Without coding: a mining geologist reports a cell-declustered mean of 3.2 g/t and a polygonal-declustered mean of 3.4 g/t on the same 800-sample drillout in a clean rectangular lease. The raw (unweighted) mean is 4.1 g/t. Is the 0.2 g/t spread between methods cause for concern, or expected? What is the §2.3-compliant way to report this finding? (Hint: the spread is small relative to the bias correction; both methods cut the raw bias from 4.1 down by roughly 25%. Report the cell-declustered mean as the primary, the polygonal as the cross-check, the agreement-within-6% disclosure, and the methods-agree verdict from the §2.3 widget.)
Without coding: a petroleum engineer has 35 well-log samples in an L-shaped reservoir block. The team's toolchain is GSLIB-only (no Voronoi library). What does the §2.3 decision rule recommend? (Hint: the irregular boundary calls for polygonal, but the toolchain calls for cell. The hybrid in rule 3 — cell declustering with the cell grid post-processed to drop cells outside the L — is the production-friendly answer. The small-n caveat also applies; the engineer should disclose both the boundary-hybrid approach and the small-n limitation in the methods section.)
Without coding: a hydrologist has a dataset of 1500 rainfall-gauge readings across a watershed with a coastline boundary. The Voronoi clipping is implemented via R's deldir package with the rw argument. The cell-declustered mean is 24.3 mm; the polygonal-declustered mean is 22.7 mm; the raw mean is 27.1 mm. Which method does §2.3 favour here, and why? (Hint: the irregular coastline favours polygonal; the toolchain supports it (R deldir); the large $n$ removes the small-n caveat. Polygonal is the primary recommendation; cell is the cross-check; the 1.6 mm spread between methods (about 7% of the values) merits an explicit disclosure in the report.)

Pause and reflect: the §2.3 widget runs MSE-vs-truth as the headline metric, averaged over 5 resamples. On a real dataset you do not have the truth — that's the whole reason you are declustering. Are there reasonable proxies for MSE-vs-truth on real data that you could compute? (Hint: leave-one-out cross-validation of the declustered mean is one approach — refit the declustering after omitting each sample and see how much the declustered mean swings. Bootstrap resampling of the original sample is another. Both are imperfect — they share the underlying sampling design rather than randomising over independent realisations of it — but they are better than nothing as stability diagnostics.) Part 6 develops these tools in full generality; for now, just notice that the §2.3 widget's use of multiple resamples is a synthetic-data shortcut that real practitioners replicate via bootstrap on their actual datasets.

What you now know — and where Part 2 goes next

You have the explicit comparison framework for cell vs polygonal declustering: agreement on easy / canonical datasets, divergence on irregular boundaries and sparse samples, and a clear set of structural drivers (boundary geometry, sampling regularity, sample size, software availability, prior availability) that determine which method to pick. You have three comparison metrics — MSE-vs-truth on synthetic, declustered-mean stability across resampling, downstream variogram-fit quality — and the honest position that no single metric crowns a universal winner. You have the §2.3 decision rule in five clauses, including the toolchain-aware hybrid for GSLIB-only teams and the explicit "run both and report the spread" rule for ambiguous cases. You have the Bourgault (1997) multi-cell-size average as the robust single-cell-declustering answer when no clean optimum cell size exists, and the "committee" combinations (average the means, take the more conservative, report both with a spread disclosure) for when divergence is real.

You have the honest scope: declustering is a HEURISTIC family. The §2.3 widget's scenario-by-scenario verdicts make this concrete — neither method wins everywhere, and the "best" answer on your dataset depends on its specific geometry, your software, and your prior. The §2.3 deliverable is therefore a defensible reporting standard ("cell-declustered mean of X with polygonal cross-check of Y; methods agree within Z%; chosen cell size selected by MIN criterion per GSLIB DECLUS sweep") and an auditable decision rule — not a single right answer.

§2.4 picks up the downstream chain. Both methods' declustered means feed the N-score transform (§1.2), the experimental variogram (Part 3) under robust estimators (§3.6), the simple-kriging mean (§5.1), the ordinary-kriging unbiasedness constraint (§5.2), and the SGS reference distribution (Part 7). §2.4's job is to walk that chain quantitatively: show that for easy datasets the choice barely cascades, but for irregular-boundary and sparse-sample datasets the choice CASCADES into the kriged map and the simulation realisations — sometimes by 5%, sometimes by 15%, depending on the geometry. By the end of §2.4 the reader has the full declustering picture: motivation (§1.3), cell algorithm (§2.1), polygonal algorithm (§2.2), comparison and decision framework (§2.3), and downstream propagation (§2.4). Then Part 3 takes the experimental variogram up — the next workhorse — and the declustering weights you computed here will reappear, weighting variogram pair counts.

References

Journel, A.G. (1983). Nonparametric estimation of spatial distributions. Mathematical Geology, 15(3), 445–468. (The cell-declustering foundation paper. §2.1's primary reference; cited here as the cell side of the §2.3 comparison.)
Deutsch, C.V. (1989). DECLUS: a Fortran 77 program for determining optimum spatial declustering weights. Computers & Geosciences, 15(3), 325–332. (The GSLIB DECLUS program. Specifies the cell-size sweep, the random-origin averaging trick, and the min/max/a-priori criterion. §2.3's decision rule for clauses 1 and 5 traces back to this paper.)
Bourgault, G. (1997). Spatial declustering weights. Mathematical Geology, 29(2), 277–290. (The multi-cell-size averaging refinement. §2.3 cites Bourgault as the practical hybrid that often works best for cell declustering when no single optimum cell size is clearly right — a robust default when the §2.1 cell-size sweep is ambiguous.)
Pyrcz, M.J., Strebelle, S., Deutsch, C.V. (2005). Multiple-point geostatistics for stochastic modeling of clastic reservoirs. Mathematical Geology, 37(1), 1–34. (Multi-method critique from the MPS literature. The argument for running multiple methods and reporting the spread — which §2.3 extends from MPS down to declustering — is articulated cleanly in this paper.)
Olea, R.A. (2009). A Practical Primer on Geostatistics. U.S. Geological Survey Open-File Report 2009-1103. (Comparative practical guide. Olea's Walker Lake comparison chapter is one of the cleanest published illustrations of the §2.3 agreement-on-easy / divergence-on-hard story.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford University Press. (Chapter 10 introduces cell and polygonal declustering side-by-side with the Walker Lake worked example. The §2.1 / §2.2 / §2.3 arc of Part 2 mirrors Isaaks & Srivastava's chapter structure; §2.3's decision rule is the modernised version of their concluding-paragraph guidance.)
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford University Press. (§4.1 covers cell and polygonal declustering with the Walker Lake worked example. The comparison metrics in §2.3 — MSE, stability, downstream variogram quality — are developed in Goovaerts' broader discussion of declustering as one tool among many.)
Deutsch, C.V., Journel, A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide, 2nd ed. Oxford University Press. (The GSLIB suite assumes cell declustering as the default but discusses polygonal as an alternative for irregular domains. §2.3's decision rule 3 — the GSLIB-only hybrid — implements the GSLIB authors' own guidance on post-processing the cell grid to respect a geological-domain polygon.)
Pyrcz, M.J., Deutsch, C.V. (2014). Geostatistical Reservoir Modeling, 2nd ed. Oxford University Press. (Reservoir-engineering treatment of declustering in production workflow. §2.3's downstream-propagation preview (cascade into variogram, kriging, SGS) is developed in detail across Pyrcz & Deutsch's chapters on uncertainty quantification.)