Exploratory data analysis for spatial data

Spatial data fundamentals

Learning objectives

Apply the standard EDA toolkit (location map, histogram, Q-Q plot, boxplot) to spatial data
Diagnose distribution shape (Normal, skewed, multi-modal, contaminated by outliers)
Recognise spatial trend visually via bubble maps and moving-window profiles
Decide when to log-transform or N-score-transform before variogram analysis
Identify outliers and decide whether to remove, retain, or Winsorize

Before fitting any variogram or running kriging, do EXPLORATORY DATA ANALYSIS. EDA reveals: distribution shape, spatial trends, outliers, suspicious clusters, and any other issue that would invalidate downstream geostatistics. This is the closing section of Part 0; everything in Parts 1+ assumes you've done EDA first.

The standard EDA toolkit

Bubble / location map: scatter of sample locations with bubble size or colour proportional to value. Reveals spatial trends, clusters, and outliers (giant bubbles).
Histogram: distribution shape (Normal? skewed? bimodal?). Watch for outliers as far-right or far-left bars.
Q-Q plot vs Normal: deviations from a straight reference line signal departures from Normality. Right-skewed: curves up at upper end. Heavy tails: curve up at both ends. Outliers: isolated points off the line.
Boxplot: quartiles + whiskers. Outliers shown as separate dots.
Moving-mean / moving-variance profile: along a transect or coordinate axis. Reveals trends and heteroscedasticity.

Decisions from EDA

Different EDA outcomes lead to different downstream decisions:

Approximately Normal histogram + flat moving mean: proceed with standard kriging directly.
Strongly right-skewed (lognormal-like): log-transform OR Normal-score transform Z̃ = Φ⁻¹(F̂(Z)). The transformed variable is approximately Normal; variogram and kriging work better; back-transform results at the end.
Visible trend in moving-mean profile: detrend before variogram (universal kriging or KED).
Outliers detected: investigate them (data error or genuine extreme value?). Decide: remove, retain, Winsorize. ALWAYS document.
Bimodal histogram: probably two facies / zones / regimes — separate them before pooling.

Outlier handling — the hardest call

An extreme observation could be:

A DATA ERROR (typo, unit conversion, instrument fault). Trace it back; correct or remove.
A GENUINE EXTREME VALUE (one of nature's right-tail draws). Keep it — your model needs to represent the population including its tail.
A POINT FROM A DIFFERENT POPULATION (a different facies, a different zone). Either model it as a mixture or exclude it from the current zone's variogram.

Always investigate before deciding. Removing 'outliers' thoughtlessly is one of the most common sources of bias in applied geostatistics.

Try it

Gaussian scenario. Bubble map shows orderly spatial trend (increases left-to-right and bottom-to-top). Histogram is symmetric. Q-Q plot near the diagonal. Skewness near 0. Recommended action: proceed.
Skewed scenario. Bubble map still shows trend but with a few very-large bubbles. Histogram is right-skewed. Q-Q plot CURVES UP at the upper end. Skewness substantially > 1. Recommended: log-transform or N-score before variogram.
Outlier scenario. Bubble map has ONE giant bubble at a single location. Histogram has a far-right bar. Q-Q plot has the topmost point WAY off the line. Outlier-present flag is YES. Recommended: investigate the outlier.
Compare summary statistics across scenarios — note how MEAN and SD respond to outliers vs MEDIAN and IQR (which are robust).
Increase n samples. EDA diagnostics become more reliable as n grows. At n = 30, distinguishing skewed from outlier-contaminated is hard; at n = 300 it's easy.

Your data has skewness 2.5 (strongly right-skewed) AND one observation that is 6 SDs above the mean. Should you treat this as one problem (the outlier's contribution to skewness) or two (genuine skew + a separate outlier)? What single plot best disentangles them?

What you now know — and Part 0 closes for both books

EDA is the first step of every spatial analysis. Bubble maps, histograms, Q-Q plots, and moving-window profiles give the four key diagnostics. EDA outcomes lead to direct decisions: log-transform skewed data, detrend trended data, investigate outliers, separate zones. Part 0 of the Geostatistics textbook is now COMPLETE: spatial data fundamentals (§0.1), stationarity (§0.2), support and scale (§0.3), coordinate systems (§0.4), sampling biases (§0.5), and EDA (§0.6). Part 1 builds on these foundations with univariate statistics for geo-data.

References

Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley. (The foundational EDA reference.)
Cleveland, W.S. (1993). Visualizing Data. Hobart Press. (Classic on visualisation principles.)
Isaaks, E.H., Srivastava, R.M. (1989). An Introduction to Applied Geostatistics. Oxford. (Chapter 2 — univariate description of spatial data.)
Tukey, J.W. (1962). "The future of data analysis." Annals of Math. Stat. 33, 1-67. (Coined many of the EDA principles.)
Wilkinson, L. (2005). The Grammar of Graphics, 2nd ed. Springer. (Modern theoretical framework for visualisation.)