Full-waveform inversion: matching observed to modeled waveforms
Learning objectives
- Explain what FWI solves: the full wave equation, iterative gradient updates
- Recognize the role of the STARTING MODEL in FWI success vs failure
- Understand CYCLE SKIPPING: the dominant FWI failure mode
- Apply MULTI-SCALE strategy: low-frequency first, progressively refine
- Know when FWI is worth running vs when simpler inversion methods (§7.3) suffice
Until §8.4, every inversion in this textbook (§7.3, §8.2) has relied on the CONVOLUTIONAL MODEL: seismic = reflectivity * wavelet. That model is a useful simplification but ignores much of the physics of real wave propagation — refraction, diffraction, wavefield complexity, attenuation, anisotropy. Full-waveform inversion (FWI) throws away the convolutional approximation and inverts the observed seismic waveforms against simulations of the FULL WAVE EQUATION. The payoff: SUPER-RESOLUTION velocity models that capture geological structure at wavelengths an order of magnitude shorter than conventional tomography.
FWI has become the standard in salt basins (Gulf of Mexico, Brazil pre-salt, West Africa), complex structural provinces (foothills, sub-thrust), and increasingly as a routine component of every major 3D processing project. The cost is compute: modern FWI runs on cluster-scale hardware, with a single field-scale project consuming tens of thousands of GPU-hours. The reward is a velocity model that shows features — narrow salt canyons, thin shale layers, fluid-filled fault zones — that no other method can resolve.
What FWI solves
FWI is a LEAST-SQUARES OPTIMIZATION problem. Given:
- Observed seismic data d_obs (all traces, all times, all offsets)
- Source wavelet estimate
- A starting velocity model V₀
FWI finds the velocity model V that MINIMIZES the misfit between observed and simulated data:
**
**
where is computed by solving the wave equation (full finite-difference or finite-element simulation) for the current velocity model V. The sum is over all sources s, all receivers r, all time samples t.
The problem is massive: at field scale, you have thousands of sources, tens of thousands of receivers, tens of thousands of time samples. d_obs is a multi-terabyte dataset; forward modeling is expensive (each iteration simulates the wavefield for every source); the velocity model has millions of voxels. This is one of the most computationally demanding inversions in any field of science.
Exercise — walk the iteration slider through three scenarios
- Open in Good starting model mode, iteration = 0. You see: TRUE profile (thick yellow step-curve), STARTING profile (grey dashed, a smoothed version of truth), CURRENT iteration (magenta, currently identical to starting). The starting is close to truth but missing the sharp layer boundaries.
- Slide iteration up to 4. The current profile (now trending toward green) is noticeably closer to the yellow truth. Misfit curve below shows the misfit dropping.
- Iteration 10: current profile (bright green) almost matches truth. Layer boundaries are sharp. Misfit has dropped by ∼85%. This is SUCCESSFUL convergence.
- Iteration 20: fully converged; current is indistinguishable from truth. Misfit is near zero. This is what a GOOD FWI project produces.
- Switch to Bad starting model. Starting profile (grey dashed) is now a LINEAR GRADIENT — very different from the layered truth. Iteration 0: current matches starting. Iteration 4: the profile begins moving but imperfectly. Iteration 10: the profile has STALLED at a wrong intermediate state. Iteration 20: unchanged. This is CYCLE SKIPPING.
- Look at the misfit curve for the bad mode: drops initially, then LEVELS OFF at a wrong, non-zero value. The inversion THINKS it has converged (misfit stopped decreasing) but to a WRONG answer. This is why FWI results must always be validated against wells.
- Switch to Multi-scale FWI. Starting model is the same bad linear gradient. Iterations 0-5: the current profile moves toward a SMOOTH version of truth (low-frequency FWI recovers long-wavelength structure). Iterations 6+: the profile sharpens into the true layer boundaries (high-frequency FWI adds detail). By iteration 20, converged. This is how multi-scale RESCUES a bad starting model.
- Key lesson from flipping between "far" and "multi": the only difference is the FREQUENCY STRATEGY. Both start from the same bad model, but multi-scale succeeds where single-scale fails. This is why modern broadband acquisition (down to 3-5 Hz) matters so much — it enables robust multi-scale FWI.
Cycle skipping: the FWI failure mode
FWI uses GRADIENT DESCENT: at each iteration, compute the gradient of the misfit w.r.t. the velocity model, take a step DOWNHILL. Gradient descent converges to the nearest local minimum, not necessarily the global (true) minimum.
When does the nearest local minimum = the global minimum? When the simulated waveforms overlap the observed waveforms BY MORE THAN HALF A WAVELENGTH. This is the "capture zone" for gradient-based FWI. If your starting model simulates waveforms shifted by MORE than half a wavelength at the dominant frequency, you’re OUTSIDE the capture zone — FWI goes to the wrong local minimum. That’s cycle skipping.
Practically: at f = 10 Hz with Vp = 3000 m/s, wavelength = 300 m. Half-wavelength = 150 m. If the traveltime error in your starting model exceeds 1/20 second (150/3000 = 0.05 s), FWI cycle-skips. You need a DECENT starting velocity model.
Sources of the starting model: (1) traditional NMO-based velocity analysis from stacked seismic (moderate quality, but fine for shallow sections); (2) reflection tomography (detailed smooth model); (3) regional geologic models (long-wavelength constraints); (4) interpolated well velocities; (5) previous cycle of FWI + manual editing. Modern FWI workflows spend 30-50% of the project time on preparing the starting model.
Multi-scale FWI: the industry standard defense
Cycle skipping is a FREQUENCY-DEPENDENT problem. At low frequencies (long wavelengths), the half-wavelength tolerance is large — starting models that would cycle-skip at 20 Hz are safely within the capture zone at 5 Hz. This insight is what makes MULTI-SCALE FWI possible:
- Low-frequency stage: filter the data to the lowest usable frequencies (typically 3-5 Hz band). Run FWI. The LONG-WAVELENGTH velocity structure is recovered without cycle skipping even from a poor starting model.
- Progressive refinement: expand the frequency band in stages (5-8 Hz, then 8-15 Hz, then 15-25 Hz, etc.). Each stage starts from the converged output of the previous — which is already close to truth at those frequencies.
- High-frequency stage: the final stage runs at the full data bandwidth, adding the fine-scale detail.
This is WHY modern marine 3D acquisition has pushed for LOWER-FREQUENCY CONTENT. Broadband sources (BroadSeis, IsoMetrix, Broadband Plus) specifically target the 3-8 Hz band that FWI needs for robust multi-scale starting. Land acquisition uses vibrators with low-frequency sweeps or dynamite with high-output sources for similar reasons.
When FWI is worth running
- Complex structural settings: salt basins, sub-salt targets, sub-thrust plays. Conventional tomography cannot resolve the complex velocity structure. FWI delivers detailed velocity models that feed into accurate depth migration.
- Near-surface problems: unconsolidated weathering zones, karst, permafrost. FWI can image the shallow complexity that degrades deeper imaging.
- High-value targets where imaging matters: billion-barrel subsalt prospects; CO2 storage pilot projects where precise velocity models enable quantitative monitoring.
- Velocity-model refinement for QI: traditional tomography gives smooth velocity; FWI adds detail that improves pre-stack migration and downstream QI inversion.
When FWI may not be worth it: (1) simple basin geometry where tomography + kirchhoff migration already gives good results; (2) very noisy data where FWI amplifies noise more than signal; (3) budget-constrained projects where the compute cost exceeds the imaging value. For most modern large projects, FWI is now the default — the question is how many iterations and what frequency bandwidth, not whether to run it.
FWI variants and extensions
- Acoustic vs elastic FWI: acoustic assumes only P-waves (simpler, faster, used for velocity-model building). Elastic FWI models P and S waves plus density (more accurate, expensive, used for QI-grade outputs).
- Anisotropic FWI: includes Thomsen ε, δ, γ in the forward model. Essential in basins with VTI shales or HTI fractured reservoirs (§8.3).
- Viscoacoustic / viscoelastic FWI: includes attenuation (Q). Important in shallow gas-bearing sediments where attenuation is severe.
- Envelope-based FWI: minimizes the misfit of the wavefield ENVELOPE rather than the waveform itself. Less cycle-skipping-prone; used as a starter for traditional FWI.
- Optimal-transport FWI: uses optimal-transport distance between waveforms instead of L2 norm. Highly robust to cycle skipping but more expensive.
- Time-domain vs frequency-domain: time-domain is more flexible for complex geology; frequency-domain is efficient for narrowband sequential inversion. Modern FWI uses time-domain with frequency selection.
FWI is the most computationally demanding but physically-honest inversion in all of reflection seismology. For velocity-model building, it has become indispensable in complex basins. For QI, it’s an emerging but powerful tool that refines the elastic properties used in Part 7 workflows. §8.5 takes a different approach: MACHINE-LEARNING QI. Rather than explicit wave-equation inversion, neural networks learn the mapping from data to properties directly — a paradigm that’s rapidly changing what’s possible in quantitative seismic interpretation.
References
- Aki, K., & Richards, P. G. (2002). Quantitative Seismology (2nd ed.). University Science Books.
- Yilmaz, Ö. (2001). Seismic Data Analysis (2 vols.). Society of Exploration Geophysicists.
- Sheriff, R. E., & Geldart, L. P. (1995). Exploration Seismology (2nd ed.). Cambridge University Press.
- Mavko, G., Mukerji, T., & Dvorkin, J. (2009). The Rock Physics Handbook (2nd ed.). Cambridge University Press.