Loss-weight sensitivity in FWI-PINN

Part 6 — Velocity inversion with PINNs

Learning objectives

  • Recognise the multi-term FWI-PINN loss as the §3.2 loss-balance crisis applied
  • See empirically how regularisation weight α shifts the misfit-minimum c₂
  • Identify the Goldilocks zone where α is large enough to break cycle-skipping but small enough to honour data
  • Connect to PINN-FWI's λ_d/λ_p balance (Wang-Teng-Perdikaris 2021)

The §3.2 widget demonstrated the loss-balance crisis on the harmonic IVP: the joint loss λicLic+Lpde\lambda_{\mathrm{ic}} L_{\mathrm{ic}} + L_{\mathrm{pde}} has weights that strongly affect convergence. PINN-FWI inherits this in spades. The full joint loss is

L=λdLdata+λpLpde+λiLic+λbLbc+λrLreg,\mathcal{L} = \lambda_d \mathcal{L}_{\mathrm{data}} + \lambda_p \mathcal{L}_{\mathrm{pde}} + \lambda_i \mathcal{L}_{\mathrm{ic}} + \lambda_b \mathcal{L}_{\mathrm{bc}} + \lambda_r \mathcal{L}_{\mathrm{reg}} ,

with FIVE different weight ratios (each pair). Each weight balances different physics. Get any one wrong and convergence stalls or finds the wrong velocity model.

The simplest 2-term version

To build intuition, this widget studies the simplest version: classical-FWI data misfit + a Tikhonov regulariser pulling the velocity toward a prior.

Jtotal(c2)=Jdata(c2)+α(c2c2init)2,J_{\mathrm{total}}(c_2) = J_{\mathrm{data}}(c_2) + \alpha (c_2 - c_2^{\mathrm{init}})^2 ,

with c2init=1.0c_2^{\mathrm{init}} = 1.0 (a deliberately wrong prior — top-layer-velocity guess). Drag the α\alpha slider over six orders of magnitude and watch the total-misfit minimum shift:

  • α very small (1e-7): regulariser is negligible. Total-misfit minimum = data-misfit minimum, which on this 1D problem may be at the cycle-skipped point c20.7c_2 \approx 0.7 or 2.3 depending on the basin.
  • α very large (1e+2): regulariser dominates. Total-misfit minimum = c2init=1.0c_2^{\mathrm{init}} = 1.0. The data is ignored.
  • α "Goldilocks" (~1e-4): balanced. The regulariser kills the spurious data-misfit local minima but doesn't override the global one. Total-misfit minimum = truth (1.5).

Try it

Fwi Loss WeightsInteractive figure — enable JavaScript to interact.

The widget pre-computes Jdata(c2)J_{\mathrm{data}}(c_2) on an 80-sample sweep at startup (once, ~5 s). The slider then re-computes Jtotal(c2)J_{\mathrm{total}}(c_2) instantly for any α\alpha. Three traces are plotted:

  • Orange: Jdata(c2)J_{\mathrm{data}}(c_2) — fixed.
  • Purple: α(c2c2init)2\alpha (c_2 - c_2^{\mathrm{init}})^2 — quadratic in c2c_2, scales with α\alpha.
  • Cyan: JtotalJ_{\mathrm{total}}. The dot marks the argmin.

The cyan dot is what gradient-descent FWI would converge to. As you change α\alpha, watch the dot slide between truth=1.5 (small α\alpha, when basins are narrow) and prior=1.0 (large α\alpha, when prior dominates). The Goldilocks zone is the narrow range where the dot lands at truth.

How production codes pick weights

  • Discrepancy principle (Tikhonov 1963; Hanke 1995). Choose α\alpha such that Jdata(c)σ2NJ_{\mathrm{data}}(c^*) \approx \sigma^2 N where σ\sigma is the noise standard deviation and NN is the number of data samples. The data is fit to its own noise floor, no further.
  • L-curve method (Hansen 1992). Plot logJreg\log J_{\mathrm{reg}} vs logJdata\log J_{\mathrm{data}} for a range of α\alpha; choose the α\alpha at the corner of the resulting "L". Standard for ill-posed inverse problems.
  • Generalized cross-validation (GCV; Golub, Heath, Wahba 1979). Pick α\alpha to minimise the predicted error on left-out data. Provably optimal in the asymptotic limit.
  • Bayesian / hierarchical. Treat α\alpha as a hyperparameter to be marginalised. Most rigorous; computationally heaviest.

PINN-FWI weights

The PINN-FWI joint loss L=λdLd+λpLp+λiLi+λbLb+λrLr\mathcal{L} = \lambda_d L_d + \lambda_p L_p + \lambda_i L_i + \lambda_b L_b + \lambda_r L_r has 4 independent weight ratios. The Wang-Teng-Perdikaris 2021 NTK-balance trick from §3.3 generalises directly to this setting: at each epoch, scale each λ\lambda inversely to the recent gradient-norm of its term. This forces all loss terms to contribute to the gradient at comparable scales, eliminating the "one term dominates" failure mode that plagues hand-tuned weights.

The McClenny-Braga-Neto SA-PINN trick (§3.4) further provides per-collocation-point weights γk\gamma_k — useful when some receiver locations or PDE collocation points are systematically harder than others. Both NTK and SA-PINN have been ported into PINN-FWI by Sun & Alkhalifah and others; see §3.3 / §3.4 for the in-depth treatment.

What §6.9 will do

§6.9 closes Part 6 with the convergence-diagnostics question: how do you know your FWI run has converged for the right reason? Misfit reduction is necessary but not sufficient. Production codes track gradient norm, model-update magnitude, model-residual decay rate, and the model-data residual cross-spectrum. The widget visualises all four for a complete §6.2 inversion.

References

  • Tikhonov, A.N., Arsenin, V.Y. (1977). Solutions of Ill-Posed Problems. Wiley.
  • Hansen, P.C. (1992). Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 34(4), 561–580. The L-curve weight-selection method.
  • Wang, S., Teng, Y., Perdikaris, P. (2021). Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 43(5), A3055–A3081. The NTK-balance §3.3 paper, applied to PINN-FWI weights.
  • McClenny, L.D., Braga-Neto, U. (2023). Self-adaptive physics-informed neural networks. JCP 474, 111722. Per-point adaptive weights, §3.4.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.