Encoded FWI & computational strategies
Learning objectives
- State the source-encoding identity and the cross-talk mechanism
- Compare naive, encoded, and mini-batch FWI by cost per iteration and iterations to converge
- Describe the memory-reduction tricks for storing the forward wavefield
- Understand why L-BFGS (quasi-Newton) is the default solver
Production FWI is computationally extreme: a full-physics 3D acoustic FWI at 10 Hz on a modest survey (10 km × 10 km × 8 km, ) takes thousands of shots, each requiring a forward plus an adjoint simulation per iteration, for tens of outer iterations, for 5–10 frequency bands. Run the arithmetic and you get days-to-weeks of GPU cluster time per survey — and that is already after using every computational trick available. This section catalogues the tricks.
1. Source encoding — the central trick
The wave equation is linear in the source: if shot produces data , the combined source produces data . Pick random signs , build a super-source and a super-data, and run one wave simulation that collectively informs every shot's gradient:
The first term is — the true FWI gradient we wanted. The second term is cross-talk: contributions from shot 's forward wavefield correlated with shot 's adjoint wavefield. Because averages to zero over many random encodings, the cross-talk averages out over iterations, leaving only the correct gradient. One simulation per iteration instead of — potentially a 100– to 1000-fold speedup, at the cost of ~3× more iterations to convergence because of the added noise.
2. The widget — cost bars
Three horizontal bars show total-simulations-to-convergence for the three strategies:
- Naive FWI: one sim per shot per iteration. For shots and 50 iterations: 50 000 sims.
- Encoded FWI: one sim total per iteration, but 3× more iterations due to cross-talk: 150 sims total — 333× cheaper than naive for N=1000.
- Mini-batch FWI: random shots per iteration, 1.5× more iterations. For : 7500 sims — 6.7× cheaper than naive.
The speedup from encoded FWI grows with : at it is faster. Production FWI over 10 000 shots that would take years naively finishes in days with encoding.
3. When encoded FWI breaks
- Irregular acquisition geometries. Encoded FWI assumes all shots contribute linearly; missing or unevenly distributed sources break the randomness assumption and cross-talk stops averaging out.
- Strong amplitude variations between shots. If one shot is 10× stronger than another, the random-sign sum is dominated by the strong shot and you effectively only invert one shot’s gradient.
- Locally correlated noise. Noise that is coherent across shots (coherent swell, electrical hum) does not average out under encoding — it stacks additively.
- Salt imaging. At sharp reflectors, the cross-talk can add coherent artefacts that never fully wash out. Hybrid strategies (encoded outer, naive inner) are used.
4. Memory — storing the forward wavefield
The adjoint-state gradient requires the forward source wavefield at every grid point and every time step. For a 3D volume with samples (single precision), that is ~800 GB per shot. Three standard mitigations:
- Checkpointing: save at every -th time step only. To compute the gradient at the skipped time steps, re-propagate forward from the nearest checkpoint. Memory drops by at the cost of extra simulations. Griewank's binomial checkpointing gives the optimal schedule.
- Random boundaries: instead of absorbing-boundary wavefield storage, randomise the velocity at the boundaries so outgoing waves return scrambled. When you reverse-time-propagate the forward wavefield (re-deriving ), the randomised boundary reproduces the original wavefield in the interior. Storage drops to zero at the cost of re-propagation.
- Wavefield reconstruction: solve the wave equation backward in time from the last saved slab (usually just the boundary). Cheaper than re-forward, and accurate enough for L-BFGS-quality gradients.
5. Parallelism
- Shot-parallel: each shot's simulation is independent; distribute across GPU nodes. Embarrassingly parallel up to workers.
- Domain-decomposed: split the model grid across GPUs within a node; each GPU handles its slab. Adds inter-GPU communication at slab boundaries but necessary for very large models.
- Frequency-parallel: multi-scale runs can be pipelined across independent GPU pools, one frequency per pool, with results handed off when ready.
6. Hessian, or: why L-BFGS is the default
Pure steepest-descent FWI converges slowly because has very different curvature along different model directions. Newton's method fixes this but requires the full Hessian , which costs simulations — infeasible for .
L-BFGS (limited-memory BFGS) approximates from the last gradient–model pairs (), giving Newton-like convergence at storage cost . Combined with an approximate diagonal pre-conditioner (Gauss-Newton on the diagonal, or the Hessian of the acquisition illumination), L-BFGS is the default FWI solver in every production package. Convergence is typically 3–5× faster than steepest descent per outer iteration, paying for itself immediately.
7. A realistic computational budget
For a 3D deep-water sub-salt project, 10 km × 15 km × 8 km at , 8 Hz, 5000 shots:
- Naive: 2 sims × 5000 shots × 200 iters × 6 bands = 12 million simulations ⇒ ~2 years on a 100-GPU cluster.
- Encoded: 2 sims × 1 super-shot × 600 iters × 6 bands = 7200 simulations ⇒ ~3 days on same cluster.
- Mini-batch : 2 sims × 500 shots × 300 iters × 6 bands = 1.8 M sims ⇒ ~100 days.
Only encoded FWI fits in a reasonable project budget. Every production FWI code supports at least source encoding; most support all three.
Source encoding buys 100–1000× fewer simulations per iteration in exchange for 3× more iterations — a net 30–300× speedup that is the only thing that makes 3D production FWI affordable.
Where this goes next
§6.4 moves from acoustic to elastic/anisotropic physics — what happens when the acoustic wave equation is wrong and converted waves or anisotropy-induced travel-time errors dominate the residual.
References
- Virieux, J., Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74, WCC1.
- Pratt, R. G. (1999). Seismic waveform inversion in the frequency domain, Part 1. Geophysics, 64, 888.
- Etgen, J., Gray, S. H., Zhang, Y. (2009). An overview of depth imaging in exploration geophysics. Geophysics, 74, WCA5.
- Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics, 49, 1259.