Machine-learning QI: neural networks for facies classification

Part 8 — Advanced QI Topics

Learning objectives

Contrast rule-based (§7.5) vs data-driven (ML) facies classification
Recognize OVER-FITTING: sparse training data + flexible model = wrong decision regions
Understand the role of training-validation-test split for honest accuracy estimates
Identify when ML adds value vs when simpler methods suffice
Apply practical safeguards: data augmentation, regularization, blind-well validation

The classifiers in §7.5 used a RULE-BASED approach: explicit Gaussian likelihoods per class + Bayes’ rule to get posteriors. That works beautifully when the class distributions are well-characterized. But what if the class boundaries are non-Gaussian, or the rock physics is too complex to write down? Machine-learning classifiers sidestep the explicit rules by LEARNING the boundaries directly from labeled training data.

ML in QI is already mainstream for facies classification, and rapidly expanding into seismic-to-property mapping (ML-based inversion), horizon auto-tracking, noise attenuation, and even salt-body picking. This section focuses on the SIMPLEST and most MATURE application: ML classification on (Ip, Vp/Vs) features. The lessons about training-data discipline apply to every ML workflow.

What ML classifiers learn

Given a training set of (features, labels) pairs — where features are elastic attributes (Ip, Vp/Vs, density, ...) and labels are class identifiers (shale, brine sand, oil sand, gas sand) — an ML classifier constructs a function f(features) → class label. The function is parameterized by weights that are learned by minimizing a training loss (typically cross-entropy for classification).

Architecture options:

Linear (logistic regression): decision boundary is a hyperplane. Simple, robust, under-fits complex class shapes.
k-Nearest Neighbors (kNN): vote among the k closest training samples. No training (just a lookup table); boundaries follow the training data. Sensitive to noise.
Decision trees / random forests: axis-aligned splits. Robust, interpretable, good with tabular data.
Neural networks (MLP): smooth flexible boundaries. Can capture any shape with enough data.
Convolutional networks: for IMAGE or volume inputs (rarely used for tabular QI, more for seismic image segmentation).
Gradient-boosted trees (XGBoost): often the top performer on tabular data. Industry favorite.

Exercise — overfitting in action

Open in Bayesian truth mode. You see the same (Ip, Vp/Vs) crossplot from §7.5 with four class regions colored by Bayesian argmax. Smooth boundaries between classes. No training data shown — this is the analytical reference.
Switch to ML (well-trained) with N train = 80. The decision map looks VERY similar to Bayesian truth — almost indistinguishable. Small dots show the training samples scattered inside each class. This is what GOOD ML looks like: many balanced samples, smooth boundaries, high accuracy.
Slide N train down to 10. The boundaries start to wobble slightly as the sample count drops. Still close to truth but clearly affected by the specific random training sample locations.
Switch to ML (sparse training, over-fitting). The widget forces a small sample count (around 10 per class). Now the decision map is full of STRANGE REGIONS — small enclaves of one class surrounded by another, jagged boundaries, non-physical shapes. This is OVER-FITTING made visible.
Compare the two ML modes. Same architecture, same algorithm — just different training data density. Good ML looks like truth; poor ML looks like noise. This is the CENTRAL LESSON: ML quality is TRAINING DATA quality.
The training-sample dots help illustrate: in the poor mode, you can see that the bizarre decision regions are latching onto INDIVIDUAL training samples. The classifier is memorizing the training set rather than learning the underlying structure.

The training / validation / test split

The #1 rule of ML in QI (and everywhere): NEVER evaluate your model on data it was trained on. You need three separate splits:

Training set: used to fit the model weights. Typically 60-80% of labeled data.
Validation set: used during training to tune HYPERPARAMETERS (learning rate, architecture size, regularization). Typically 10-20% of data.
Test set: NEVER TOUCHED until the final model is locked in. Used ONCE to report the model’s honest accuracy. Typically 10-20% of data.

For QI, the splits must respect SPATIAL and GEOLOGICAL boundaries. Random splitting at the sample level leaks information: adjacent samples from the same well have similar features. Instead split by WELL: some wells fully in training, others fully in test. This is the "blind well" validation from §7.5 — the standard for honest QI ML performance.

Published QI ML papers that don’t do blind-well testing are essentially useless for practical deployment. Always ask: how did they split? If they can’t answer, don’t trust the reported accuracy.

When ML beats rule-based (and when it doesn't)

ML wins: (1) Non-Gaussian class distributions — multi-modal, curved, or irregular-shaped clusters. (2) Many features (5+); Bayesian Gaussian gets unstable in high dimensions but ML handles it. (3) Abundant training data (thousands of samples per class). (4) Complex interactions between features that rule-based models struggle to encode.
Rule-based wins: (1) Simple, ellipsoidal class shapes (like our §7.5 example). (2) LIMITED training data. (3) When EXPLAINABILITY matters — rule-based classifiers can be traced through specific equations; ML black-box is harder to defend in regulated environments. (4) When you have GOOD ROCK-PHYSICS understanding that you don’t want to relearn from data.
Hybrid wins (common in practice): use rule-based as the DEFAULT, then use ML to IMPROVE specific regions where rule-based struggles (e.g., non-Gaussian lobes, thin beds). Compare both outputs; trust regions where they agree, flag regions where they disagree.

Beyond classification: ML for inversion itself

The ML front has pushed beyond facies classification into SEISMIC INVERSION directly. Approaches:

ML-assisted inversion: use an ML regressor to learn the mapping from seismic waveforms (near, mid, far offset data) to elastic properties (Ip, Is, ρ). Skips the rock-physics + inversion pipeline. Fast. Unreliable when training data doesn’t cover the target rock regime.
Physics-informed neural networks (PINNs): neural networks with physics constraints (wave equation residual) as part of the loss. Combines ML flexibility with physical consistency. Emerging research, not yet mainstream.
Unsupervised anomaly detection: autoencoders that learn the "typical" seismic appearance. Regions with high reconstruction error are potential anomalies (bright spots, unusual features). Useful for rapid screening of large surveys.
Generative models: GANs and diffusion models that generate plausible reservoir realizations matching observed seismic. Used for uncertainty quantification beyond what stochastic inversion can do.

Most of these are in the RESEARCH-to-PILOT transition zone as of 2025. They will probably be mainstream in 5-10 years. The lessons from classification carry over: training data discipline matters more than architecture sophistication.

Practical ML-QI deployment checklist

Training data audit: how many samples, balanced across classes, sampled from diverse geological settings? (> 100 per class is minimum.)
Blind-well validation: test accuracy measured on wells NOT used in training. Report as THE accuracy, not the in-sample training accuracy.
Regularization: L2 weight decay, dropout, early stopping to prevent over-fitting.
Feature engineering: don’t just throw raw elastic attributes at the network. Include derived features (ratios, differences, log-space coordinates) that make the class boundaries more separable.
Uncertainty quantification: use Monte Carlo dropout, ensemble methods, or Bayesian neural networks to get per-voxel confidence estimates. Don’t ship hard classifications without uncertainty.
Post-deployment monitoring: every time a new well is drilled, compare predicted vs observed facies. Track accuracy over time. Retrain annually or when drift exceeds a threshold.
Explainability: for high-stakes decisions (drill go/no-go), the ML prediction should come with FEATURE IMPORTANCE maps showing which elastic attributes drove the prediction. SHAP, LIME, or simpler gradient-based methods.

ML is a powerful tool for QI but not a magic wand. The same disciplines that govern good rule-based QI — well calibration, physical plausibility, uncertainty tracking, blind validation — apply with even more force to ML. A well-validated ML classifier can approach the analytical Bayesian optimum (§7.5) with minimal assumptions about class distributions; a poorly-validated one produces confident predictions that are systematically wrong. §8.6 closes Part 8 with the fastest-growing application of QI today: CO2 storage and subsurface monitoring. It combines everything — rock physics, inversion, probabilistic facies, 4D, and ML — into a single, life-cycle monitoring program.

References

Chopra, S., & Marfurt, K. J. (2007). Seismic Attributes for Prospect Identification and Reservoir Characterization. Society of Exploration Geophysicists.
Chopra, S., & Marfurt, K. J. (2014). Seismic attributes — a promising aid for geologic prediction. CSEG Recorder.
Mavko, G., Mukerji, T., & Dvorkin, J. (2009). The Rock Physics Handbook (2nd ed.). Cambridge University Press.
Marfurt, K. J., Kirlin, R. L., Farmer, S. L., & Bahorich, M. S. (1998). 3-D seismic attributes using a semblance-based coherence algorithm. Geophysics, 63(4), 1150–1165.