Full-waveform inversion: matching observed to modeled waveforms

Part 8 — Advanced QI Topics

Learning objectives

Explain what FWI solves: the full wave equation, iterative gradient updates
Recognize the role of the STARTING MODEL in FWI success vs failure
Understand CYCLE SKIPPING: the dominant FWI failure mode
Apply MULTI-SCALE strategy: low-frequency first, progressively refine
Know when FWI is worth running vs when simpler inversion methods (§7.3) suffice

Until §8.4, every inversion in this textbook (§7.3, §8.2) has relied on the CONVOLUTIONAL MODEL: seismic = reflectivity * wavelet. That model is a useful simplification but ignores much of the physics of real wave propagation — refraction, diffraction, wavefield complexity, attenuation, anisotropy. Full-waveform inversion (FWI) throws away the convolutional approximation and inverts the observed seismic waveforms against simulations of the FULL WAVE EQUATION. The payoff: SUPER-RESOLUTION velocity models that capture geological structure at wavelengths an order of magnitude shorter than conventional tomography.

FWI has become the standard in salt basins (Gulf of Mexico, Brazil pre-salt, West Africa), complex structural provinces (foothills, sub-thrust), and increasingly as a routine component of every major 3D processing project. The cost is compute: modern FWI runs on cluster-scale hardware, with a single field-scale project consuming tens of thousands of GPU-hours. The reward is a velocity model that shows features — narrow salt canyons, thin shale layers, fluid-filled fault zones — that no other method can resolve.

What FWI solves

FWI is a LEAST-SQUARES OPTIMIZATION problem. Given:

Observed seismic data d_obs (all traces, all times, all offsets)
Source wavelet estimate
A starting velocity model V₀

FWI finds the velocity model V that MINIMIZES the misfit between observed and simulated data:

\mathcal{E}(V) = \tfrac{1}{2} \sum_{s,r,t} (d_\text{obs}(s,r,t) - d_\text{sim}(V; s,r,t))^2

where $d_\text{sim}(V)$ is computed by solving the wave equation (full finite-difference or finite-element simulation) for the current velocity model V. The sum is over all sources s, all receivers r, all time samples t.

The problem is massive: at field scale, you have thousands of sources, tens of thousands of receivers, tens of thousands of time samples. d_obs is a multi-terabyte dataset; forward modeling is expensive (each iteration simulates the wavefield for every source); the velocity model has millions of voxels. This is one of the most computationally demanding inversions in any field of science.

Exercise — walk the iteration slider through three scenarios

Open in Good starting model mode, iteration = 0. You see: TRUE profile (thick yellow step-curve), STARTING profile (grey dashed, a smoothed version of truth), CURRENT iteration (magenta, currently identical to starting). The starting is close to truth but missing the sharp layer boundaries.
Slide iteration up to 4. The current profile (now trending toward green) is noticeably closer to the yellow truth. Misfit curve below shows the misfit dropping.
Iteration 10: current profile (bright green) almost matches truth. Layer boundaries are sharp. Misfit has dropped by ∼85%. This is SUCCESSFUL convergence.
Iteration 20: fully converged; current is indistinguishable from truth. Misfit is near zero. This is what a GOOD FWI project produces.
Switch to Bad starting model. Starting profile (grey dashed) is now a LINEAR GRADIENT — very different from the layered truth. Iteration 0: current matches starting. Iteration 4: the profile begins moving but imperfectly. Iteration 10: the profile has STALLED at a wrong intermediate state. Iteration 20: unchanged. This is CYCLE SKIPPING.
Look at the misfit curve for the bad mode: drops initially, then LEVELS OFF at a wrong, non-zero value. The inversion THINKS it has converged (misfit stopped decreasing) but to a WRONG answer. This is why FWI results must always be validated against wells.
Switch to Multi-scale FWI. Starting model is the same bad linear gradient. Iterations 0-5: the current profile moves toward a SMOOTH version of truth (low-frequency FWI recovers long-wavelength structure). Iterations 6+: the profile sharpens into the true layer boundaries (high-frequency FWI adds detail). By iteration 20, converged. This is how multi-scale RESCUES a bad starting model.
Key lesson from flipping between "far" and "multi": the only difference is the FREQUENCY STRATEGY. Both start from the same bad model, but multi-scale succeeds where single-scale fails. This is why modern broadband acquisition (down to 3-5 Hz) matters so much — it enables robust multi-scale FWI.

Cycle skipping: the FWI failure mode

FWI uses GRADIENT DESCENT: at each iteration, compute the gradient of the misfit w.r.t. the velocity model, take a step DOWNHILL. Gradient descent converges to the nearest local minimum, not necessarily the global (true) minimum.

When does the nearest local minimum = the global minimum? When the simulated waveforms overlap the observed waveforms BY MORE THAN HALF A WAVELENGTH. This is the "capture zone" for gradient-based FWI. If your starting model simulates waveforms shifted by MORE than half a wavelength at the dominant frequency, you’re OUTSIDE the capture zone — FWI goes to the wrong local minimum. That’s cycle skipping.

Practically: at f = 10 Hz with Vp = 3000 m/s, wavelength = 300 m. Half-wavelength = 150 m. If the traveltime error in your starting model exceeds 1/20 second (150/3000 = 0.05 s), FWI cycle-skips. You need a DECENT starting velocity model.

Sources of the starting model: (1) traditional NMO-based velocity analysis from stacked seismic (moderate quality, but fine for shallow sections); (2) reflection tomography (detailed smooth model); (3) regional geologic models (long-wavelength constraints); (4) interpolated well velocities; (5) previous cycle of FWI + manual editing. Modern FWI workflows spend 30-50% of the project time on preparing the starting model.

Multi-scale FWI: the industry standard defense

Cycle skipping is a FREQUENCY-DEPENDENT problem. At low frequencies (long wavelengths), the half-wavelength tolerance is large — starting models that would cycle-skip at 20 Hz are safely within the capture zone at 5 Hz. This insight is what makes MULTI-SCALE FWI possible:

Low-frequency stage: filter the data to the lowest usable frequencies (typically 3-5 Hz band). Run FWI. The LONG-WAVELENGTH velocity structure is recovered without cycle skipping even from a poor starting model.
Progressive refinement: expand the frequency band in stages (5-8 Hz, then 8-15 Hz, then 15-25 Hz, etc.). Each stage starts from the converged output of the previous — which is already close to truth at those frequencies.
High-frequency stage: the final stage runs at the full data bandwidth, adding the fine-scale detail.

This is WHY modern marine 3D acquisition has pushed for LOWER-FREQUENCY CONTENT. Broadband sources (BroadSeis, IsoMetrix, Broadband Plus) specifically target the 3-8 Hz band that FWI needs for robust multi-scale starting. Land acquisition uses vibrators with low-frequency sweeps or dynamite with high-output sources for similar reasons.

When FWI is worth running

Complex structural settings: salt basins, sub-salt targets, sub-thrust plays. Conventional tomography cannot resolve the complex velocity structure. FWI delivers detailed velocity models that feed into accurate depth migration.
Near-surface problems: unconsolidated weathering zones, karst, permafrost. FWI can image the shallow complexity that degrades deeper imaging.
High-value targets where imaging matters: billion-barrel subsalt prospects; CO2 storage pilot projects where precise velocity models enable quantitative monitoring.
Velocity-model refinement for QI: traditional tomography gives smooth velocity; FWI adds detail that improves pre-stack migration and downstream QI inversion.

When FWI may not be worth it: (1) simple basin geometry where tomography + kirchhoff migration already gives good results; (2) very noisy data where FWI amplifies noise more than signal; (3) budget-constrained projects where the compute cost exceeds the imaging value. For most modern large projects, FWI is now the default — the question is how many iterations and what frequency bandwidth, not whether to run it.

FWI variants and extensions

Acoustic vs elastic FWI: acoustic assumes only P-waves (simpler, faster, used for velocity-model building). Elastic FWI models P and S waves plus density (more accurate, expensive, used for QI-grade outputs).
Anisotropic FWI: includes Thomsen ε, δ, γ in the forward model. Essential in basins with VTI shales or HTI fractured reservoirs (§8.3).
Viscoacoustic / viscoelastic FWI: includes attenuation (Q). Important in shallow gas-bearing sediments where attenuation is severe.
Envelope-based FWI: minimizes the misfit of the wavefield ENVELOPE rather than the waveform itself. Less cycle-skipping-prone; used as a starter for traditional FWI.
Optimal-transport FWI: uses optimal-transport distance between waveforms instead of L2 norm. Highly robust to cycle skipping but more expensive.
Time-domain vs frequency-domain: time-domain is more flexible for complex geology; frequency-domain is efficient for narrowband sequential inversion. Modern FWI uses time-domain with frequency selection.

FWI is the most computationally demanding but physically-honest inversion in all of reflection seismology. For velocity-model building, it has become indispensable in complex basins. For QI, it’s an emerging but powerful tool that refines the elastic properties used in Part 7 workflows. §8.5 takes a different approach: MACHINE-LEARNING QI. Rather than explicit wave-equation inversion, neural networks learn the mapping from data to properties directly — a paradigm that’s rapidly changing what’s possible in quantitative seismic interpretation.

References

Aki, K., & Richards, P. G. (2002). Quantitative Seismology (2nd ed.). University Science Books.
Yilmaz, Ö. (2001). Seismic Data Analysis (2 vols.). Society of Exploration Geophysicists.
Sheriff, R. E., & Geldart, L. P. (1995). Exploration Seismology (2nd ed.). Cambridge University Press.
Mavko, G., Mukerji, T., & Dvorkin, J. (2009). The Rock Physics Handbook (2nd ed.). Cambridge University Press.