Encoded FWI & computational strategies

Part 6 — Full-Waveform Inversion

Learning objectives

  • State the source-encoding identity and the cross-talk mechanism
  • Compare naive, encoded, and mini-batch FWI by cost per iteration and iterations to converge
  • Describe the memory-reduction tricks for storing the forward wavefield
  • Understand why L-BFGS (quasi-Newton) is the default solver

Production FWI is computationally extreme: a full-physics 3D acoustic FWI at 10 Hz on a modest survey (10 km × 10 km × 8 km, Δx=25 m\Delta x = 25\ \text{m}) takes thousands of shots, each requiring a forward plus an adjoint simulation per iteration, for tens of outer iterations, for 5–10 frequency bands. Run the arithmetic and you get days-to-weeks of GPU cluster time per survey — and that is already after using every computational trick available. This section catalogues the tricks.

1. Source encoding — the central trick

The wave equation is linear in the source: if shot ii produces data did_i, the combined source icisi\sum_i c_i s_i produces data icidi\sum_i c_i d_i. Pick random signs ci{1,+1}c_i \in {-1, +1}, build a super-source and a super-data, and run one wave simulation that collectively informs every shot's gradient:

senc=icisi,denc=icidi,genc=ici2gi+ijcicjXijs_{\text{enc}} = \sum_i c_i s_i,\quad d_{\text{enc}} = \sum_i c_i d_i,\quad g_{\text{enc}} = \sum_i c_i^2\, g_i + \sum_{i \neq j} c_i c_j\, X_{ij}

The first term is igi\sum_i g_i — the true FWI gradient we wanted. The second term is cross-talk: contributions from shot ii's forward wavefield correlated with shot jj's adjoint wavefield. Because cicjc_i c_j averages to zero over many random encodings, the cross-talk averages out over iterations, leaving only the correct gradient. One simulation per iteration instead of NshotsN_{shots} — potentially a 100– to 1000-fold speedup, at the cost of ~3× more iterations to convergence because of the added noise.

2. The widget — cost bars

Fwi Cost DemoInteractive figure — enable JavaScript to interact.

Three horizontal bars show total-simulations-to-convergence for the three strategies:

  • Naive FWI: one sim per shot per iteration. For N=1000N = 1000 shots and 50 iterations: 50 000 sims.
  • Encoded FWI: one sim total per iteration, but 3× more iterations due to cross-talk: 150 sims total — 333× cheaper than naive for N=1000.
  • Mini-batch FWI: kk random shots per iteration, 1.5× more iterations. For k=0.1N=100k = 0.1N = 100: 7500 sims — 6.7× cheaper than naive.

The speedup from encoded FWI grows with NshotsN_{shots}: at N=10000N = 10,000 it is 10000×50/150=3333×10,000 \times 50 / 150 = 3,333\times faster. Production FWI over 10 000 shots that would take years naively finishes in days with encoding.

3. When encoded FWI breaks

  • Irregular acquisition geometries. Encoded FWI assumes all shots contribute linearly; missing or unevenly distributed sources break the randomness assumption and cross-talk stops averaging out.
  • Strong amplitude variations between shots. If one shot is 10× stronger than another, the random-sign sum is dominated by the strong shot and you effectively only invert one shot’s gradient.
  • Locally correlated noise. Noise that is coherent across shots (coherent swell, electrical hum) does not average out under encoding — it stacks additively.
  • Salt imaging. At sharp reflectors, the cross-talk can add coherent artefacts that never fully wash out. Hybrid strategies (encoded outer, naive inner) are used.

4. Memory — storing the forward wavefield

The adjoint-state gradient requires the forward source wavefield Us(x,z,t)U_s(x, z, t) at every grid point and every time step. For a 3D volume with 400×400×320×4000400 \times 400 \times 320 \times 4000 samples (single precision), that is ~800 GB per shot. Three standard mitigations:

  • Checkpointing: save UsU_s at every kk-th time step only. To compute the gradient at the skipped time steps, re-propagate forward from the nearest checkpoint. Memory drops by kk at the cost of logk\log k extra simulations. Griewank's binomial checkpointing gives the optimal schedule.
  • Random boundaries: instead of absorbing-boundary wavefield storage, randomise the velocity at the boundaries so outgoing waves return scrambled. When you reverse-time-propagate the forward wavefield (re-deriving UsU_s), the randomised boundary reproduces the original wavefield in the interior. Storage drops to zero at the cost of re-propagation.
    • Wavefield reconstruction: solve the wave equation backward in time from the last saved slab (usually just the boundary). Cheaper than re-forward, and accurate enough for L-BFGS-quality gradients.

5. Parallelism

  • Shot-parallel: each shot's simulation is independent; distribute across GPU nodes. Embarrassingly parallel up to NshotsN_{shots} workers.
  • Domain-decomposed: split the model grid across GPUs within a node; each GPU handles its slab. Adds inter-GPU communication at slab boundaries but necessary for very large models.
  • Frequency-parallel: multi-scale runs can be pipelined across independent GPU pools, one frequency per pool, with results handed off when ready.

6. Hessian, or: why L-BFGS is the default

Pure steepest-descent FWI converges slowly because J(m)J(m) has very different curvature along different model directions. Newton's method mmH1gm \leftarrow m - H^{-1} g fixes this but requires the full Hessian H=2J/m2H = \partial^2 J / \partial m^2, which costs Nmodel2N_{model}^2 simulations — infeasible for Nmodel108N_{model} \sim 10^8.

L-BFGS (limited-memory BFGS) approximates H1H^{-1} from the last kk gradient–model pairs (k=520k = 5-20), giving Newton-like convergence at storage cost kNmodelk \cdot N_{model}. Combined with an approximate diagonal pre-conditioner (Gauss-Newton on the diagonal, or the Hessian of the acquisition illumination), L-BFGS is the default FWI solver in every production package. Convergence is typically 3–5× faster than steepest descent per outer iteration, paying for itself immediately.

7. A realistic computational budget

For a 3D deep-water sub-salt project, 10 km × 15 km × 8 km at Δx=25 m\Delta x = 25\ \text{m}, 8 Hz, 5000 shots:

  • Naive: 2 sims × 5000 shots × 200 iters × 6 bands = 12 million simulations ⇒ ~2 years on a 100-GPU cluster.
  • Encoded: 2 sims × 1 super-shot × 600 iters × 6 bands = 7200 simulations ⇒ ~3 days on same cluster.
  • Mini-batch k=500k = 500: 2 sims × 500 shots × 300 iters × 6 bands = 1.8 M sims ⇒ ~100 days.

Only encoded FWI fits in a reasonable project budget. Every production FWI code supports at least source encoding; most support all three.

**The one sentence to remember**

Source encoding buys 100–1000× fewer simulations per iteration in exchange for 3× more iterations — a net 30–300× speedup that is the only thing that makes 3D production FWI affordable.

Where this goes next

§6.4 moves from acoustic to elastic/anisotropic physics — what happens when the acoustic wave equation is wrong and converted waves or anisotropy-induced travel-time errors dominate the residual.

References

  • Virieux, J., Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74, WCC1.
  • Pratt, R. G. (1999). Seismic waveform inversion in the frequency domain, Part 1. Geophysics, 64, 888.
  • Etgen, J., Gray, S. H., Zhang, Y. (2009). An overview of depth imaging in exploration geophysics. Geophysics, 74, WCA5.
  • Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics, 49, 1259.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.