Loss-weight sensitivity in FWI-PINN
Learning objectives
- Recognise the multi-term FWI-PINN loss as the §3.2 loss-balance crisis applied
- See empirically how regularisation weight α shifts the misfit-minimum c₂
- Identify the Goldilocks zone where α is large enough to break cycle-skipping but small enough to honour data
- Connect to PINN-FWI's λ_d/λ_p balance (Wang-Teng-Perdikaris 2021)
The §3.2 widget demonstrated the loss-balance crisis on the harmonic IVP: the joint loss has weights that strongly affect convergence. PINN-FWI inherits this in spades. The full joint loss is
with FIVE different weight ratios (each pair). Each weight balances different physics. Get any one wrong and convergence stalls or finds the wrong velocity model.
The simplest 2-term version
To build intuition, this widget studies the simplest version: classical-FWI data misfit + a Tikhonov regulariser pulling the velocity toward a prior.
with (a deliberately wrong prior — top-layer-velocity guess). Drag the slider over six orders of magnitude and watch the total-misfit minimum shift:
- α very small (1e-7): regulariser is negligible. Total-misfit minimum = data-misfit minimum, which on this 1D problem may be at the cycle-skipped point or 2.3 depending on the basin.
- α very large (1e+2): regulariser dominates. Total-misfit minimum = . The data is ignored.
- α "Goldilocks" (~1e-4): balanced. The regulariser kills the spurious data-misfit local minima but doesn't override the global one. Total-misfit minimum = truth (1.5).
Try it
The widget pre-computes on an 80-sample sweep at startup (once, ~5 s). The slider then re-computes instantly for any . Three traces are plotted:
- Orange: — fixed.
- Purple: — quadratic in , scales with .
- Cyan: . The dot marks the argmin.
The cyan dot is what gradient-descent FWI would converge to. As you change , watch the dot slide between truth=1.5 (small , when basins are narrow) and prior=1.0 (large , when prior dominates). The Goldilocks zone is the narrow range where the dot lands at truth.
How production codes pick weights
- Discrepancy principle (Tikhonov 1963; Hanke 1995). Choose such that where is the noise standard deviation and is the number of data samples. The data is fit to its own noise floor, no further.
- L-curve method (Hansen 1992). Plot vs for a range of ; choose the at the corner of the resulting "L". Standard for ill-posed inverse problems.
- Generalized cross-validation (GCV; Golub, Heath, Wahba 1979). Pick to minimise the predicted error on left-out data. Provably optimal in the asymptotic limit.
- Bayesian / hierarchical. Treat as a hyperparameter to be marginalised. Most rigorous; computationally heaviest.
PINN-FWI weights
The PINN-FWI joint loss has 4 independent weight ratios. The Wang-Teng-Perdikaris 2021 NTK-balance trick from §3.3 generalises directly to this setting: at each epoch, scale each inversely to the recent gradient-norm of its term. This forces all loss terms to contribute to the gradient at comparable scales, eliminating the "one term dominates" failure mode that plagues hand-tuned weights.
The McClenny-Braga-Neto SA-PINN trick (§3.4) further provides per-collocation-point weights — useful when some receiver locations or PDE collocation points are systematically harder than others. Both NTK and SA-PINN have been ported into PINN-FWI by Sun & Alkhalifah and others; see §3.3 / §3.4 for the in-depth treatment.
What §6.9 will do
§6.9 closes Part 6 with the convergence-diagnostics question: how do you know your FWI run has converged for the right reason? Misfit reduction is necessary but not sufficient. Production codes track gradient norm, model-update magnitude, model-residual decay rate, and the model-data residual cross-spectrum. The widget visualises all four for a complete §6.2 inversion.
References
- Tikhonov, A.N., Arsenin, V.Y. (1977). Solutions of Ill-Posed Problems. Wiley.
- Hansen, P.C. (1992). Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 34(4), 561–580. The L-curve weight-selection method.
- Wang, S., Teng, Y., Perdikaris, P. (2021). Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 43(5), A3055–A3081. The NTK-balance §3.3 paper, applied to PINN-FWI weights.
- McClenny, L.D., Braga-Neto, U. (2023). Self-adaptive physics-informed neural networks. JCP 474, 111722. Per-point adaptive weights, §3.4.