Learned propagators for fast forward modelling

Part 8 — Operator learning for seismology

Learning objectives

Recognise the time-stepping operator as the natural target for FNO in seismology
Train a single-step propagator on the 1-D wave equation
Roll the learned propagator out for many timesteps and observe error growth
Distinguish stable from drifting learned propagators
Map the 1-D demo onto 2D/3D production wavefield surrogates

The wave-propagation problem in seismology is a TIME-STEPPING problem. FDTD takes a wavefield at time $t$ and applies a fixed update stencil to get the wavefield at $t + \Delta t$ . Run this update for $N$ timesteps to get a full simulation of duration $N \Delta t$ . For 2D production-grade FDTD on Marmousi-class velocity models with thousands of shots, this is the dominant compute cost in any seismic-imaging or FWI pipeline.

Operator learning offers an alternative: TRAIN a network $\mathcal{P}_{\mathrm{NN}}: u^n \mapsto u^{n+1}$ that learns the time-stepping operator from FDTD-generated examples. Inference is one forward pass per timestep — typically $10\text{–}1000\times$ faster than FDTD per step on GPU. The amortised cost is dominated by training; once trained, the propagator can be applied to any new initial condition or velocity model within the training distribution at near-zero per-step cost.

Architecture choice: FNO is natural

For wavefield time-stepping, FNO (§8.3) is the natural architecture:

Wavefields live on REGULAR GRIDS — exactly FNO's natural domain.
The wave-propagation operator is approximately TRANSLATION-INVARIANT (in homogeneous medium it is exactly translation-invariant; in heterogeneous medium the local stencil is the same shape but with locally-varying coefficients).
Resolution-invariance means train at one grid spacing, evaluate at another — useful when survey grids change between projects.
The time-stepping operator naturally has a SECOND-ORDER recursion: $u^{n+1}$ depends on $u^n$ AND $u^{n-1}$ (because the wave equation is second-order in time). FNO's multi-channel architecture handles this trivially.

For DeepONet, time-stepping is awkward — the trunk would need to take $(x, n)$ inputs and the basis dimension would need to be huge to span all time evolution. FNO is structurally a better fit.

The 1-D wave-equation case (analytic ground truth)

For the homogeneous-medium 1-D wave equation $\partial^2 u / \partial t^2 = c^2 \partial^2 u / \partial x^2$ with Dirichlet BCs on $[0, 1]$ , the EXACT propagator is diagonal in the sine basis:

\hat{u}_k(t + \Delta t) = 2 \cos(c k \pi \Delta t) \cdot \hat{u}_k(t) - \hat{u}_k(t - \Delta t) .

So a 1-layer FNO with two-channel input $(u^n, u^{n-1})$ and learnable coefficients $\alpha_k, \beta_k$ per Fourier mode should converge to

\alpha_k \to 2 \cos(c k \pi \Delta t), \quad \beta_k \to -1 .

This is the cleanest possible test of "can the propagator architecture express what the physics requires". If the FNO can match the analytic propagator, we have a proper time-stepping surrogate; if not, we have a problem.

Try it: train + roll out 200 steps

The widget:

Trains the 32 propagator coefficients ( $\alpha_k, \beta_k$ for $k = 1..16$ ) on random 5-mode initial conditions × ~25 timesteps each, with the analytic snapshot at each time as the training target. ~1-2 s on a modern laptop.
Rolls out the learned propagator from a fixed test IC for 200 timesteps, comparing against the analytic solution at each step.
Reports the rollout RMS error trajectory — the diagnostic for stability under composition.

Four panels:

Initial wavefield $u(x, 0)$ .
Wavefield at $t = 200 \Delta t$ : exact (cyan dashed) vs learned-propagator rollout (orange).
Learned coefficients $\alpha_k$ (orange) and $\beta_k$ (purple) compared with exact values (cyan dots overlay).
Per-step rollout RMS error on log-y. A horizontal line means STABLE; a rising line means DRIFT.

Expected behaviour: because the 1-D wave operator is exactly diagonal in sine basis, the FNO converges to the exact coefficients to machine precision. The rollout error stays at ~1e-6 (float32 noise) for the entire 200 steps — completely stable, no drift. This is the IDEAL CASE for learned propagators.

Why drift happens — and why this case avoids it

The general worry with iterative neural propagators is that small per-step errors COMPOUND. If the learned operator $\mathcal{P}_{\mathrm{NN}}$ has spectral radius slightly greater than the true operator $\mathcal{P}$ , then after $N$ steps the error grows like

\|\mathcal{P}_{\mathrm{NN}}^N u_0 - \mathcal{P}^N u_0\| \sim N \cdot \|\mathcal{P}_{\mathrm{NN}} - \mathcal{P}\| \cdot \|u_0\| \quad \text{(linear regime)}

or worse exponentially in $N$ if the spectral radius of $\mathcal{P}_{\mathrm{NN}}$ exceeds 1 even slightly. Production learned propagators (Lehmann et al 2024 F-FNO for 3D elastic waves; Geneva-Zabaras 2022 implicit neural operators) deal with this via three techniques:

Multi-step rollout training. Train on 5-50-step rollouts rather than single steps, so the loss penalises compounding error directly. Costly per epoch but dramatically improves long-horizon stability.
Spectral-radius regularisation. Add a penalty $\lambda , \max_k |\hat{K}_k|$ to keep the learned operator non-amplifying. Soft analogue of the CFL stability condition.
Implicit time stepping. Rather than learning $u^{n+1} = \mathcal{P}$ , learn an INCREMENT $u^{n+1} = u^n + \Delta t \cdot \mathcal{F}$ {\mathrm{NN}}(u^n) $u^{n + 1} = u^{n} + Δ t \cdot F_{NN} (u^{n})$ where $\mathcal{F}$ is small. Gradient with respect to $u^n$ is dominated by the identity, so even bad $\mathcal{F}$ {\mathrm{NN}} $F_{NN}$ gives stable rollouts. Equivalent to neural-ODE perspectives.

Our 1-D demo dodges drift because the analytic operator IS exactly representable in the chosen architecture. For 2D heterogeneous-medium operators with multi-layer FNO, drift is real and these techniques matter.

Mapping to 2D/3D production

Industry deployment of learned propagators (Caltech, NVIDIA, Saudi Aramco) uses 2D/3D FNOs with multiple layers (~4-8) and channels (~32-128). Training data: thousands of FDTD simulations on diverse velocity models. Compute budget: 1-7 days on multi-GPU rigs. Once trained:

2D wavefield evaluation: $\sim 10$ ms per timestep on a single GPU (vs FDTD's $\sim 100$ ms-1 s)
Backward AD through propagator: cheap, useful for FWI where wave-equation propagator is the inner loop
Out-of-distribution warning: if the test velocity model has structure unlike training data, prediction fails silently. Always cross-check with FDTD on at least one test case.

Lehmann et al 2024 demonstrated 3D elastic wave propagation with the F-FNO (Factorised FNO) running ~50× faster than spectral-element FDTD with $<5%$ error in body-wave amplitudes. This level of speedup makes Bayesian FWI workflows tractable that were previously cost-prohibitive.

What §8.5 will do

§8.5 wraps the operator-learning machinery into an INTERACTIVE PARAMETER EXPLORER. Users adjust velocity-model parameters, source positions, or boundary geometry on sliders and watch the operator network produce wavefields/travel-times in real time — at 60 fps. The amortisation argument from §8.1 made tactile.

References

Lehmann, F., Gatti, F., Bertin, M., Clouteau, D. (2024). 3D elastic wave propagation with a Factorized Fourier Neural Operator (F-FNO). Comput. Methods Appl. Mech. Eng. 420, 116718. Production 3D elastic-wave learned propagator.
Geneva, N., Zabaras, N. (2022). Transformers for modeling physical systems. Neural Networks 146, 272–289. Implicit time-stepping with neural operators.
Alkhalifah, T., Song, C., bin Waheed, U., Hao, Q. (2021). Wavefield solutions from machine learned functions constrained by the Helmholtz equation. Artificial Intelligence in Geosciences 2, 11–19. PINN-based Helmholtz wavefield modelling that informed learned-propagator practice.
Li, Z., Liu-Schiaffini, M., Kovachki, N., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., Anandkumar, A. (2022). Learning chaotic dynamics in dissipative systems. NeurIPS 2022. Markovian neural operators for stable long-horizon rollouts.