PINN-augmented classical FWI

Part 9 — Hybrid PINN + classical, with uncertainty

Learning objectives

Operationalise §9.1's regulariser role as a concrete loss function
Recognise the U-curve of regularisation strength: under vs over
Anneal λ from high to low across iterations as a production strategy
Quantify when annealing beats every fixed λ
Set up §9.3 (learned regularisers replace the hand-crafted Tikhonov)

§9.1 framed PINN-as-regulariser as one of three production roles. §9.2 makes that role concrete: how do you choose $\lambda$ ? When does the regulariser help vs hurt? And what scheduling strategy works best? The widget below answers all three operationally on the same 2-layer inversion setup.

The hybrid loss

Combine the classical FWI data misfit with a PINN-residual penalty:

\mathcal{L}(\theta) = \underbrace{\sum_{s, r} \bigl( T_{\mathrm{pred}}(x_r; x_s; c(\theta)) - t_{\mathrm{obs}}(x_r; x_s) \bigr)^2}_{\mathcal{L}_{\mathrm{data}}} + \lambda \, \underbrace{\frac{1}{N_c} \sum_k \bigl( |\nabla T_{\mathrm{NN}}|^2 c(\theta)^2 - 1 \bigr)^2}_{\mathcal{L}_{\mathrm{pinn}}} .

For the 2-layer toy problem in this widget, we substitute the eikonal-residual term with a physics-informed depth-trend prior:

\mathcal{L}_{\mathrm{pinn}}(v_1, v_2) = \bigl( (v_2 - v_1) - \Delta v_{\mathrm{prior}} \bigr)^2 + 4 \, \max(0, v_1 - v_2)^2 ,

where $\Delta v_{\mathrm{prior}} = 1.5$ km/s is the expected velocity contrast — chosen here to match the truth's contrast so that the regulariser is INFORMATIVE on this toy problem. In production this would be an INDEPENDENT geological prior (from regional well logs or tectonic context), not extracted from the truth — but the MACHINERY is the same: the regulariser encodes whatever Δv we believe in. The first term is a Tikhonov-style quadratic well around the prior contrast; the second is a one-sided penalty on velocity inversions ( $v_1 > v_2$ ) that encodes the seismic-sediment compaction principle that velocity generally INCREASES with depth. This is the kind of soft prior production FWI codes use — replacing the eikonal-residual penalty with an analogous physics-informed term that has a meaningful gradient at the bad initial guess.

The U-curve of regularisation strength

For any fixed $\lambda$ , gradient descent on $\mathcal{L}(\theta)$ converges to a different local minimum:

$\lambda \to 0$ (under-regularised): pure data misfit. With a bad initial guess, gradient descent gets stuck at a local minimum that fits data well but at wrong parameters.
$\lambda$ moderate: regulariser smooths the loss landscape just enough to escape bad basins. Optimum.
$\lambda \to \infty$ (over-regularised): regulariser dominates. Data is barely fit; solution is biased toward whatever the regulariser prefers (here, smooth velocity).

Plotting parameter recovery error vs $\lambda$ traces out a characteristic U-shape with a clear minimum. The widget runs 6 inversions across $\lambda \in {0, 10^{-3}, 10^{-2}, 10^{-1}, 1, 10}$ to map this curve in the actual problem.

Annealing: the production strategy

Production FWI codes RARELY use a fixed $\lambda$ . Instead they ANNEAL it: high $\lambda$ early to escape bad basins, decaying to low $\lambda$ near the end to fit data precisely. This pattern is everywhere in inverse-problem optimisation:

Simulated annealing: temperature $T$ decays during MCMC.
Frequency continuation in FWI (§6.4): start with low-frequency data (smooth misfit) → high frequency.
Total-variation regularisation in image denoising: TV weight decays as iterations proceed.

The schedule we use here is

\lambda(t) = \lambda_0 \exp(-\beta \cdot t / T) ,

with $\lambda_0 = 2$ , $\beta = 5$ , $T = 50$ iterations. So $\lambda$ drops from 2.0 to ~0.013 over the run — basin-escape early, data-fit late.

Try it

The widget runs 7 inversions sequentially:

λ-sweep: 6 fixed-λ values from 0 to 10. Plot final parameter L² error vs λ on log-log. Mark the optimum λ with a yellow ring.
Annealed: 7th inversion with λ(t) = 2·exp(−5t/T). Compare its final error against the U-curve minimum.

Two panels: the U-curve (with annealed marker overlay) and the dual-axis schedule plot showing $\lambda(t)$ (orange) and data misfit (purple) over the 50 annealed iterations.

Expected behaviour:

U-curve has a clear minimum somewhere in the middle of the λ range. λ=0 (classical) and λ=10 (over-regularised) are both worse than λ ~ 0.01-0.1.
Annealing beats or matches the best fixed-λ point, often by 1.5-3× depending on the problem geometry.
Schedule plot shows λ decaying smoothly while data misfit (right axis, log) drops steadily — the textbook annealing-pattern signature.

Choosing λ_0 and β: the meta-problem

Annealing parameters introduce their own meta-tuning question. Three rules of thumb from production FWI codes:

λ_0 ≈ data-misfit / regulariser-misfit at initial guess. So at iteration 0, the two terms have COMPARABLE magnitude. If your data misfit is ~1 and your regulariser term at init is ~0.5, set $\lambda_0 \approx 2$ .
β = ln(initial / final ratio). If you want $\lambda$ to drop from $\lambda_0$ to $\lambda_0 / 100$ over the run, set $\beta = \ln(100) \approx 4.6$ .
Decay slower than data convergence. λ should decay slowly enough that data misfit has time to drop while the regulariser still provides escape support. Too fast = under-regularised again; too slow = over-regularised.

Real FWI codes also include MULTIPLE regularisers (TV + L2 + smoothness + sparsity) with separate annealing schedules per term. The complexity grows but the principle is the same: blend physics-informed priors with data fit, with weights that adapt to the inversion phase.

What §9.3 will do

The Tikhonov regulariser $(v_2 - v_1)^2$ is HAND-CRAFTED — we (the textbook authors) chose this functional form. §9.3 generalises this to LEARNED regularisers: train a neural network on a corpus of plausible velocity models, use its reconstruction loss as the regulariser. Replaces the hand-tuned prior with a data-driven one. §9.4 takes the next step to GENERATIVE priors (VAE, diffusion).

References

Sun, J., Innanen, K.A., Huang, C. (2021). Physics-guided deep learning for seismic inversion with hybrid training and uncertainty analysis. Geophysics 86(3), R303–R317. PINN-augmented FWI on real seismic data.
Aster, R.C., Borchers, B., Thurber, C.H. (2019). Parameter Estimation and Inverse Problems, 3rd ed., Elsevier. Standard reference for regularisation in inverse problems; chapters on Tikhonov, L-curves, and trade-off analysis.
Hansen, P.C. (1992). Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 34(4), 561–580. The L-curve method for choosing optimal λ — generalises the U-curve idea.
Krischer, L., Fichtner, A. (2017). Generalized interferometry for FWI inversion gradient. GJI 209(1), 277–292. Production seismic FWI gradient with multiple regularisers.