The inverse problem, mathematically

Part 6 — Full-Waveform Inversion

Learning objectives

  • State the L2 FWI objective function and identify its inputs
  • Derive the gradient update rule via the adjoint state method
  • Explain cycle skipping and how frequency controls the basin of attraction
  • Describe the full FWI iteration: forward, adjoint, gradient, line search, repeat

Full-waveform inversion (FWI) treats seismic imaging as a nonlinear least-squares optimisation problem. You have observed data dobs(s,r,t)d_{obs}(s, r, t) recorded at receivers rr from sources ss. You have a model mm (usually velocity, sometimes density or anisotropy) and a wave simulator that produces synthetic data dsyn(m)d_{syn}(m). Pick the mm that minimises the difference. That is FWI in one sentence. The rest of this section is what that optimisation actually costs and why it is so hard to do without getting lost in local minima.

1. The L2 objective function

J(m)=12s,r,t[dobs(s,r,t)dsyn(s,r,t;m)]2J(m) = \tfrac{1}{2} \sum_{s,r,t} \bigl[d_{\text{obs}}(s,r,t) - d_{\text{syn}}(s,r,t;\,m)\bigr]^2

This is the ordinary squared-error misfit integrated over every source, receiver, and time sample. Small J means the synthetic data match the observed data; large J means they do not. FWI is gradient descent on J(m)J(m) in the space of all possible velocity models. L2 is the default because it has a well-behaved gradient; more robust variants (L1, Huber, correlation-based) are used when the data has outliers or large systematic errors.

2. The gradient and the adjoint state method

To run gradient descent we need mJ\nabla_m J — the partial derivative of JJ with respect to every pixel of the velocity model. A naive finite-difference approach would perturb each pixel, re-run the simulator, and measure the change in JJ. That costs one forward simulation per pixel, which for a model with 10610^6 pixels is impossible. The adjoint state method gets the entire gradient with just two simulations, regardless of model size:

  • Forward: Run the wave equation forward from the source wavelet to get the forward wavefield Us(x,z,t)U_s(x, z, t) and the synthetic data dsynd_{syn}.
  • Residual: Compute the trace-by-trace residual r(s,r,t)=dsyndobsr(s, r, t) = d_{syn} - d_{obs}.
  • Adjoint: Inject r(s,r,t)r(s, r, t) in reverse-time as a source at each receiver, propagate the wave equation backward to get the adjoint wavefield Ur(x,z,t)U_r^\dagger(x, z, t).
  • Cross-correlate: The gradient at each pixel is
J/m(x,z)Us(x,z,t)Ur(x,z,t)dt\partial J/\partial m(x,z) \propto -\int U_s(x,z,t)\,U_r^\dagger(x,z,t)\,dt

— a zero-lag cross-correlation between the forward and adjoint wavefields. This is structurally identical to the RTM imaging condition of §5.7; FWI and RTM share the same machinery, they just interpret the output differently. RTM outputs an image (reflectivity); FWI outputs a velocity-model correction.

3. The cycle-skipping problem

Gradient descent converges to the nearest local minimum of J(m)J(m). If the initial model is close to the truth, that local minimum is the global minimum and the answer is right. If the initial model is far from the truth, the local minimum can be a cycle-skipped solution — a model where the synthetic data match the observed data shifted by one full wavelength. Gradient descent cannot see past the next peak of the misfit, so it never finds the true minimum.

The boundary is set by frequency. If the synthetic and observed traces are misaligned by more than half a wavelet period (T/2=1/(2f)T/2 = 1/(2f)), the gradient tells you to move in the wrong direction — toward the cycle-skipped minimum instead of the true one. The global basin of J(m) — the region where gradient descent converges to truth — has half-width Δt<1/(2f)\Delta t < 1/(2f).

4. The widget

Fwi Cycle Skip DemoInteractive figure — enable JavaScript to interact.

Simplified 1D FWI. Single reflector at 1000 m, true velocity Vtrue=2000 m/sV_{true} = 2000\ \text{m/s}, so the observed trace is a Ricker wavelet centred at 1.0 s. The synthetic trace for trial velocity VV is a Ricker centred at 2z/V2z/V. Left panel shows both traces; right panel shows the L2 misfit J(V)J(V) with a dot at the current VguessV_{guess}.

Slide VguessV_{guess} with f=15 Hzf = 15\ \text{Hz}: the landscape has a deep central basin around 2000 m/s with oscillating side-lobes at cycle-skipped velocities. The info strip tells you whether the current guess is inside the global basin, near its edge, or cycle-skipped. Now drop the frequency to 5 Hz: the basin widens dramatically — you can start further from V_{true} and still converge. Raise it to 45 Hz: the basin shrinks; even a guess 200 m/s off is already cycle-skipped. This trade-off is the single most important number in production FWI: the lowest usable frequency sets how forgiving the method is of your initial model.

5. The FWI iteration in full

  • Initialise with a smooth velocity model from tomography (§5.9), well logs, or prior seismic.
  • Filter the observed data to the lowest available frequency band.
  • Forward the source wavelet through the current model to get synthetic data.
  • Residual: r=dsyndobsr = d_{syn} - d_{obs}.
  • Adjoint: reverse-propagate rr as a receiver-side source.
  • Gradient: zero-lag cross-correlate forward and adjoint wavefields per pixel.
  • Pre-condition the gradient (scale by approximate inverse Hessian, apply masks that zero out regions outside the illumination cone).
  • Line search along the gradient to find the step length α\alpha that minimises J(m+αg)J(m + \alpha \cdot g).
  • Update: mm+αgm \leftarrow m + \alpha \cdot g.
  • Repeat until convergence (gradient magnitude below threshold, or J stops decreasing).
  • Raise frequency band and go back to step 3 — multi-scale continuation.

A production 3D FWI runs this loop for 50–200 outer iterations across 5–10 frequency bands. Each outer iteration is 2 wave simulations (forward + adjoint) per shot × thousands of shots. GPU clusters run for days to weeks per frequency band. The final model is worth it: FWI recovers velocity detail at a tenth the wavelength of the lowest used frequency, producing images with clarity no ray-based method can match.

6. What can go wrong

  • Cycle skipping — the widget's whole message. Mitigate by starting at low frequency; mitigate further by initial models good to within half a wavelength of truth at the starting frequency.
  • Local minima from unmodelled physics. If the data contains elastic converted waves and the simulator is acoustic, the residual has no correct gradient — FWI tries to match unmodelled events by tweaking velocity, yielding garbage. Elastic FWI (§6.4) is the answer.
  • Source-wavelet errors. A mismatched source wavelet maps to a systematic velocity bias. Solution: jointly invert for the source wavelet, or use source-independent misfits (correlation coefficient, trace-envelope matching).
  • Noise in the observations. Low-frequency swell, dip lines, 60 Hz hum. FWI tries to fit all of it. Pre-filter aggressively; use robust misfits in noisy bands.
  • Computational cost. Forward + adjoint per shot per iteration per band gets expensive quickly. See §6.3 for encoded and source-encoded FWI that collapses thousands of shots into a few "super-shots".
**The one sentence to remember**

FWI is gradient descent on ½Σ(d_obs − d_syn(m))² using the adjoint-state method to compute the gradient in two wave simulations — the whole game is avoiding cycle skipping, and the answer is start low frequency and climb.

Where this goes next

§6.2 turns the cycle-skipping tradeoff into a concrete workflow: multi-scale frequency continuation, data preconditioning, envelope-FWI, time-domain-windowing strategies, and the family of tricks production FWI uses to stretch the usable frequency band downward.

References

  • Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics, 49, 1259.
  • Virieux, J., Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74, WCC1.
  • Pratt, R. G. (1999). Seismic waveform inversion in the frequency domain, Part 1. Geophysics, 64, 888.
  • Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.