Why ML for Geoscience? Problem Framing and Inference

Part 1, Chapter 1: Welcome, ML for Earth and Subsurface Data

Learning objectives

Understand the role of ML in modern geosciences
Define inverse problems and forward models
Distinguish supervised vs unsupervised learning
Identify key ML applications in geoscience

What Is Machine Learning?

Imagine you are a geologist examining thousands of thin-section images of rocks. After seeing enough examples, you develop an intuition: "This pattern of mineral grains looks like sandstone; that one looks like limestone." Machine Learning (ML) is the science of giving computers a similar ability, the ability to learn patterns from data without being explicitly programmed with a fixed set of rules.

More formally, a computer program is said to learn from experience $E$ with respect to some task $T$ and performance measure $P$ , if its performance on $T$ (as measured by $P$ ) improves with experience $E$ . This is Tom Mitchell's classic definition (1997).

Key Idea

Traditional programming: you give the computer rules and data, and it produces answers.
Machine learning: you give the computer data and answers, and it learns the rules.

Why Machine Learning in Geosciences?

Geosciences generate enormous volumes of complex data, seismic surveys that span terabytes, satellite images covering the entire planet daily, well-log measurements from thousands of boreholes, and real-time sensor data from earthquake monitoring networks. Traditional analysis methods struggle with this scale and complexity. Machine learning thrives on it.

Here are some of the most impactful applications of ML in geoscience:

Seismic Interpretation: Automatically picking first arrivals, identifying faults, and classifying seismic facies from reflection data. Neural networks can process 3D seismic volumes that would take human interpreters months.
Well-Log Analysis: Predicting lithology (rock type), porosity, and permeability from geophysical well logs (gamma ray, resistivity, sonic, density). ML models can correlate multiple logs simultaneously.
Mineral Exploration: Identifying prospective mineral deposits from geochemical, geophysical, and remote sensing data. ML can find subtle multi-dimensional patterns that humans miss.
Climate Modeling: Improving predictions of temperature, precipitation, and extreme weather events. ML can learn complex nonlinear relationships in climate data.
Earthquake Prediction & Seismology: Detecting seismic events, classifying earthquake types, and estimating ground-motion parameters. Deep learning has revolutionized earthquake catalog generation.
Remote Sensing: Land-cover classification, change detection, and geological mapping from satellite or drone imagery.
Reservoir Modeling: Estimating subsurface properties, predicting fluid flow, and optimizing well placement in oil and gas or geothermal reservoirs.

Forward Models and Inverse Problems

Much of geoscience revolves around a fundamental challenge: we can observe data at the surface (seismic waves, gravity anomalies, magnetic fields), but what we really want to know is the model of the subsurface (rock types, layer thicknesses, density distributions). This is the inverse problem.

The Forward Problem

Given a model $m$ of the Earth (layer properties, velocities, densities), predict the data $d$ we would observe:

$d = G(m)$

where $G$ is the forward operator (the physics that maps model to data). For a linear system, this simplifies to:

$d = G \cdot m$

where $G$ is a matrix. Forward problems are generally well-posed, given the model, we can compute unique, stable data.

The Inverse Problem

Given observed data $d$ , find the model $m$ that produced it:

$m = G^{-1}(d)$

Inverse problems are typically ill-posed: the solution may not exist, may not be unique, or may be unstable (small data errors cause large model changes). This is where ML shines, it can learn the mapping from data to model from many examples.

Example: In seismic exploration, the forward problem is computing a seismogram from a velocity model (using wave equations). The inverse problem is determining the velocity model from recorded seismograms.

The demo below makes this concrete with the gravity example above. Forward: place a buried excess mass at some depth and read off the surface anomaly $\Delta g(x)$ , one model, one data curve, uniquely. Inverse: turn on auto-fit and sweep the depth, a whole band of depths reproduces the same anomaly within its noise, while the required mass changes several-fold. The data alone cannot pin the body; that ambiguity is exactly why inversion (classical or ML) leans on priors.

Brief Introduction to Geostatistics

Before ML became dominant, geoscientists used geostatistics to make predictions about spatial data. Two key concepts:

Variogram

A variogram measures how spatial correlation decreases with distance. If two measurements are close together, they tend to be similar. The semivariance at lag $h$ is:

$\gamma(h) = \frac{1}{2N(h)} \sum_{i=1}^{N(h)} [z(x_i) - z(x_i + h)]^2$

where $z(x_i)$ is the measurement at location $x_i$ and $N(h)$ is the number of pairs at distance $h$ .

Kriging

Kriging is a spatial interpolation method that uses the variogram to make optimal predictions at unmeasured locations. It gives both an estimate and an uncertainty (kriging variance). ML methods like Gaussian Process Regression are closely related to kriging.

Geostatistics and ML are complementary. Geostatistics excels when we have strong spatial structure and limited data; ML excels when we have abundant data and complex, nonlinear patterns.

Overview of Neural Networks (Preview)

A neural network is a computational model loosely inspired by the brain. It consists of layers of interconnected "neurons" (nodes) that transform input data into output predictions. We will study neural networks in detail in later chapters, but here is a high-level preview:

Input layer: Receives the raw features (e.g., seismic amplitudes, well-log values).
Hidden layers: Apply weighted sums and nonlinear activation functions to learn complex patterns.
Output layer: Produces the prediction (e.g., rock type, porosity value).

A single neuron computes:

$y = f\!\left(\sum_{i=1}^{n} w_i x_i + b\right)$ i=1nwixi+bright)

where $x_i$ i are inputs, $w_i$ i are learned weights, $b$ is a bias, and $f$ is a nonlinear activation function (e.g., ReLU, sigmoid).

Types of Machine Learning

Supervised Learning

The algorithm learns from labeled data, input-output pairs. The goal is to learn a function that maps inputs to outputs.

Classification: Predict a discrete category. Example: Given well-log features, classify the rock as sandstone, shale, or limestone.
Regression: Predict a continuous value. Example: Given seismic attributes, predict porosity (a percentage).

Unsupervised Learning

The algorithm learns from unlabeled data, only inputs, no desired outputs. The goal is to discover hidden structure.

Clustering: Group similar data points. Example: Cluster seismic traces into distinct facies groups.
Dimensionality Reduction: Reduce the number of features while preserving important information. Example: PCA on multi-attribute seismic data.

Reinforcement Learning

An agent learns to make decisions by interacting with an environment and receiving rewards or penalties. Example: Optimizing drilling trajectory in real time. (Less common in geosciences but growing.)

Semi-Supervised & Self-Supervised Learning

A middle ground: use a small amount of labeled data with a large amount of unlabeled data. Very relevant in geoscience, where labeling (e.g., core analysis) is expensive but unlabeled data (e.g., seismic) is abundant.

References

Bergen, K.J., Johnson, P.A., de Hoop, M.V., Beroza, G.C. (2019). Machine learning for data-driven discovery in solid Earth geoscience. Science 363, eaau0323.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., Prabhat (2019). Deep learning and process understanding for data-driven Earth-system science. Nature 566, 195-204.
Karpatne, A., Atluri, G., Faghmous, J.H., et al. (2017). Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29(10), 2318-2331.
Bishop, C.M. (2006). Pattern Recognition and Machine Learning, ch. 1 (introduction). Springer.