When operator learning beats per-instance training
Learning objectives
- Compute the crossover N* from measured costs in the actual browser
- Recognise the regimes where operator learning amortises favourably
- Identify when classical solvers and per-instance PINNs still win
- Anticipate out-of-distribution failures and the data-coverage requirement
- Wrap up Part 8 and look ahead to Part 9 (UQ + hybrid PINN-FWI)
Part 8 has presented operator learning as a paradigm-shifting toolkit. We now confront the honest cost-benefit question: WHEN does it pay off, and when do classical methods (FDTD, FSM, per-instance PINNs) remain the right choice? The answer is a single inequality.
The crossover formula
Let:
- : cost of pretraining the operator network ONCE.
- : cost of solving ONE problem instance from scratch (e.g., training a per-instance PINN, or running FDTD).
- : cost of ONE operator inference (a single forward pass through the trained network).
- : number of distinct problem instances we want to solve.
Total cost for each strategy:
Operator wins when , which gives
For seismic operators is typically s and is s, so the speedup ratio is enormous. The crossover depends primarily on how cheap the problem already is per-instance vs how expensive pretraining is. Common ranges:
- 1-D toy problems (this textbook): s, s. Crossover at .
- 2-D acoustic FWI: s (PINN per source), hr. Crossover at sources.
- 3-D elastic wave propagation: hr (FDTD shot), week (F-FNO training, Lehmann et al 2024). Crossover at shots.
For full survey-scale FWI projects with thousands of shots and many velocity-model iterations, all three regimes are deeply in the operator-wins regime.
Try it: measure your own crossover
The widget runs in three timed phases: (1) pretrain a DeepONet on the §8.5-style 7-parameter heat-equation family, (2) measure inference time over 100 forward passes for tight statistics, (3) fit a small per-instance MLP to ONE specific instance via supervised regression. After all three measurements, the cost-vs-N chart shows where operator learning beats per-instance for THIS browser, on THIS machine, with THESE problem sizes. The N slider lets you place yourself at any working point and read off the speedup factor.
Beyond N*: the qualitative arguments
Crossover is the quantitative argument. There are four qualitative arguments that reinforce it:
- Bayesian-friendly forward model. Sampling from a posterior via MCMC needs forward evaluations. With FDTD this is impractical at scale; with an operator surrogate, a few hours of MCMC suffices. This unlocks UNCERTAINTY QUANTIFICATION on FWI results — the central topic of Part 9.
- Differentiable end-to-end. Operator networks are differentiable through their inputs (initial conditions, velocity model). For inverse problems formulated as gradient descent on the velocity model, the operator provides via auto-diff. FDTD requires manual adjoint-state implementation per equation type.
- Real-time interactivity. §8.5's parameter sliders. With FDTD, a designer waits seconds-to-minutes per parameter change; with an operator surrogate, design happens at 60 fps.
- GPU-efficient inference. Operator networks pack many forward passes into a single GPU kernel call. A typical TensorRT or ONNX deployment of an FNO does 1000 forward passes in 100 ms — far faster than 1000 separate FDTD invocations would manage.
When operator learning loses
Three scenarios where classical solvers and per-instance PINNs still win:
- One-off problems with N = 1. If you have a single legacy survey to analyse and never need to re-do it, pretraining an operator network is wasted effort. Just run FDTD or train one PINN.
- Out-of-distribution problems. Operator networks trained on Marmousi-class velocity models will not generalise to volcanic basement structures or strong-anisotropy salt domes. The training distribution is the operating envelope; outside it, predictions silently fail. For exotic case studies, classical solvers handle out-of-distribution inputs trivially.
- Verifying classical-solver baselines. Even with a deployed operator surrogate, important production runs typically include at least one FDTD verification for trust calibration. The operator surrogate is the workhorse; FDTD is the safety net.
Hybrid architectures: best of both
Modern production seismic-imaging pipelines often use hybrid architectures that combine operator pretraining with PINN fine-tuning:
- Operator warm-start for PINN. Pretrain an operator surrogate on a velocity-model family. For a NEW velocity model, evaluate the surrogate to get an initial guess, then refine with a per-instance PINN initialised from that guess. The PINN converges in 10× fewer epochs because it starts from a good answer.
- Operator + classical FWI gradient correction. Use the operator for the wave-equation forward solve in FWI; combine its gradient with a small classical FDTD correction to reduce out-of-distribution errors. Saves of compute.
- Operator + Bayesian UQ. Cheap operator forwards enable HMC/Stein VI over the posterior, with a final classical-FDTD posterior-mean simulation for trust verification.
These are the architectures Part 9 will build on. Operator surrogates are not a replacement for PINNs and FDTD; they are a NEW LAYER in the toolkit, slotted in where amortisation pays off, and bypassed where it does not.
Out-of-distribution detection: a critical gap
The single biggest engineering risk of operator-based seismic workflows is SILENT FAILURE on OOD inputs. A network trained on smooth gradient velocity models will produce confident-looking predictions for a velocity model containing a salt dome — and those predictions can be ARBITRARILY WRONG without any in-band warning. This is fundamentally different from FDTD failure modes, which are usually loud (numerical instability, NaN propagation, etc.).
Production OOD detection techniques:
- Likelihood under the training distribution. Compute for each new input. If below a threshold, flag and revert to FDTD.
- Model ensembles. Train several operator networks with different seeds; high disagreement on a new input indicates OOD.
- PDE-residual check. Apply the network to predict , then check if satisfies the eikonal residual. If not, fall back to FSM/FDTD.
- Posterior uncertainty (Bayesian Operators). Train operator networks with weight-space uncertainty (BNN, MC-Dropout). High predictive variance signals OOD inputs. Implementation cost: 1.5-2× normal training but provides a direct uncertainty signal.
Part 9 will revisit OOD detection in the broader UQ context.
Part 8 wrap-up
Part 8 covered the operator-learning paradigm end-to-end:
- §8.1 Per-instance vs operator framing — the conceptual pivot, with a worked DeepONet on the antiderivative operator demonstrating amortised inference.
- §8.2 DeepONet architecture deep-dive — the branch + trunk decomposition as a learned-basis representation of the operator, visualised on the Poisson BVP.
- §8.3 Fourier Neural Operators (FNO) — spectral-convolution layers, resolution-invariance, and a single-layer FNO that recovered the heat-equation operator to machine precision.
- §8.4 Learned wave-equation propagators — time-stepping with FNO, the eigenvalue-stability constraint that determines whether rollouts stay bounded, and structural-stability tricks (β=-1 freeze, α clamp) that mirror production-code techniques.
- §8.5 Real-time parametric explorers — the amortisation made tactile, with 7-D heat-operator family explored at 60 fps after a 5-second pretraining.
- §8.6 (this section) The crossover analysis: when operator learning amortises, when classical methods still win, and the hybrid architectures that combine the strengths of both.
The reader who completes Part 8 has the toolkit to BUILD an operator surrogate for any new PDE family, EVALUATE the cost-benefit decision honestly, and DEPLOY hybrid PINN-operator architectures in production seismic-imaging workflows. Part 9 takes operator learning to the next step: hybridise it with classical FWI, add Bayesian uncertainty quantification on top, and confront the unique challenges of seismic-inverse problems at production scale.
References
- Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G.E. (2021). DeepONet. Nat. Mach. Intell. 3(3), 218–229.
- Li, Z., Kovachki, N., Azizzadenesheli, K., et al. (2020). Fourier Neural Operator for Parametric Partial Differential Equations. ICLR 2021. arXiv:2010.08895.
- Lu, L., Meng, X., Cai, S., et al. (2022). A comprehensive and fair comparison of two neural operators. CMAME 393, 114778. Empirical comparison of DeepONet and FNO across multiple problem families.
- Lehmann, F., Gatti, F., Bertin, M., Clouteau, D. (2024). F-FNO 3D elastic-wave propagation. CMAME 420, 116718. Production-scale operator-based seismic surrogate.
- Hendrycks, D., Gimpel, K. (2017). A baseline for detecting misclassified and out-of-distribution examples in neural networks. ICLR 2017. Foundational OOD-detection paper.