ML for FWI gradient acceleration

Part 9 — Machine Learning in Processing

Learning objectives

Identify the three points where ML can accelerate FWI: initialisation, regularisation, surrogate forward
Explain why each kind of ML acceleration changes convergence but preserves physics
Quantify the speed-up achievable with ML-initialised and ML-regularised FWI
Recognise the production caveats: training-distribution match, transferability, QC

Part 9 closes with the current ML frontier in production seismic processing: accelerating FWI. Unlike the previous four sections where ML was the main algorithm, FWI acceleration uses ML alongside physics. The forward and adjoint wave-equation simulations remain the ground truth; ML reduces the number of times you have to run them or the number of iterations they must do.

1. Three entry points for ML

Learned initial models. A CNN maps tomographic velocity models (or legacy seismic volumes) to FWI-ready starting models. Training pairs: (tomography output, converged FWI output) from historical projects. The CNN "jumps" the model part-way toward the converged state, reducing the initial misfit.
Learned regularisers. Instead of a hand-tuned TV or smoothness penalty, train a network to recognise "geologically plausible" velocity models and apply it as a soft constraint in the FWI loss. Tends to produce cleaner results with fewer iterations.
Surrogate forward modelling. Train a network to mimic the forward simulator. Inference is 10–100× faster than a real finite-difference simulation, but accuracy degrades for model perturbations far from the training distribution. Used for rapid Monte Carlo uncertainty estimation; not (yet) a full substitute for real FWI.

Three convergence curves on log-scale misfit vs iteration:

Pure FWI (red). Starting misfit 1.0 (normalised), exponential decay rate 3 % per iteration. Needs ~130 iterations to reach 2 % misfit.
ML-init FWI (yellow). Starting misfit 0.45 (ML starting model is partway to truth). Same decay rate. Reaches 2 % much sooner.
ML-init + ML-reg (green). Same low start + steeper decay (learned regulariser keeps each iteration's update pointed in a plausible direction). Fastest.

Sliders control ML starting-model quality (0–1) and ML regulariser strength (0–1). The info strip reports iterations-to-target for each and the total speedup factor.

3. What ML cannot do for FWI

Replace adjoint gradients. The physics-based adjoint gives the exact gradient of the misfit with respect to the model. An ML surrogate gradient is approximate; using it as the only gradient produces biased FWI. Production workflows use real adjoints for the final iterations even if ML accelerates early iterations.
Invent geology. A CNN trained on North Sea datasets cannot produce a correct starting model for a Gulf of Mexico sub-salt project. Domain shift is a hard limit.
Remove cycle-skipping risk. A bad ML initial model can still be cycle-skipped relative to the data. Multi-scale FWI (§6.2) is still required.
Guarantee amplitude fidelity. An ML regulariser may bias amplitudes toward training-distribution statistics; QI-grade FWI still needs post-hoc amplitude calibration.

4. Production deployment patterns

Initialiser only: ML generates the starting model; pure physics FWI from then on. Safe and widely deployed.
Initialiser + learned prior: ML starting model + TV-style learned regulariser for the first half of the iterations, then pure physics regularisation for the final iterations (to avoid amplitude bias). More aggressive; common in research projects.
Full ML FWI: surrogate forward + learned gradient + learned prior. Highly efficient but requires strong QC and limited to training-similar projects. Emerging in time-lapse monitoring where baseline FWI has already been done carefully.

5. Quantitative expectations

Typical production numbers (from field reports and vendor benchmarks, circa 2024–2025):

ML starting model alone: 1.5–3× fewer FWI iterations to converge.
ML starting + ML regulariser: 2–5× fewer iterations.
Surrogate forward model (research): 10–100× throughput on Monte-Carlo estimation tasks; not currently trusted for final FWI models.
4D monitoring with ML-init FWI: convergence in days instead of weeks, enabling more frequent re-inversions.

6. QC for ML-accelerated FWI

Compare final model to pure-physics FWI. On a subset of projects, run both pure and ML-accelerated FWI to verify they converge to the same model. If they disagree, diagnose whether the ML is biasing the solution.
Well ties on final model. Compare ML-FWI output to wells; require the same tie quality as pure FWI.
Synthetic-vs-recorded forward tests (§6.5). The pure-physics criterion: if the forward-modelled data from the final model matches observed data, the ML acceleration did no harm.
Cross-survey generalisation test. Re-run the ML-FWI on a survey not seen in training; verify no systematic bias appears.

**The one sentence to remember**

ML accelerates FWI by providing learned starting models, learned regularisers, and (experimentally) surrogate forward operators — producing 2–5× speed-ups without replacing the physics-based adjoint gradients that give FWI its mathematical guarantees.

Part 9 closes here

You have the ML-in-seismic-processing toolkit: where ML fits (§9.1), CNN denoising (§9.2), trace interpolation (§9.3), first-break picking (§9.4), and FWI acceleration (§9.5). The theme across all five: ML is most valuable as a complement to physics, not a replacement — classical methods provide the guarantees, ML provides the speed and accuracy at pattern-recognition tasks. Part 10 brings all nine previous parts together in a series of end-to-end processing capstones that walk through complete projects from raw data to final deliverable.

References

Virieux, J., Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74, WCC1.
Pratt, R. G. (1999). Seismic waveform inversion in the frequency domain, Part 1. Geophysics, 64, 888.
Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics, 49, 1259.
Etgen, J., Gray, S. H., Zhang, Y. (2009). An overview of depth imaging in exploration geophysics. Geophysics, 74, WCA5.