1D velocity inversion end-to-end

Part 6, Velocity inversion with PINNs

Learning objectives

Run classical FWI gradient descent on the §6.1 problem, end-to-end
Watch the velocity profile c(x) descend toward c_true over 25 iterations
Use backtracking line search and gradient smoothing as a teaching stand-in for L-BFGS + Tikhonov
Verify empirically that good starting models converge and bad ones cycle-skip
See the data fit improve from a poor-match d_pred to a near-overlay with d_obs

§6.1 showed a single FWI gradient at one starting model. The actual job, recovering the velocity field, is iterative: compute the gradient, take a step downhill, repeat. This section runs that loop on the same 3-layer 1D model and watches the velocity profile descend.

For a clean demonstration we parameterise the unknown velocity field by its 3 LAYER VELOCITIES $(c_1, c_2, c_3)$ via a partition-of-unity built from the same sigmoidal layer boundaries used elsewhere:

c(x) = c_1 \, w_1(x) + c_2 \, w_2(x) + c_3 \, w_3(x), \qquad w_1 + w_2 + w_3 \equiv 1

where $w_1$ peaks in the top layer, $w_2$ in the middle, $w_3$ in the bottom. The Plessix gradient $\partial J/\partial c(x)$ from §6.1 is projected onto this partition via the chain rule:

\frac{\partial J}{\partial c_k} = \int_0^L \frac{\partial J}{\partial c(x)} \cdot w_k(x) \, dx ,

giving a 3-component gradient that the line search descends in 3-d parameter space. This is identical to what production FWI does when using a wavelet-basis or grid-blocked parameterisation, the inverse problem is restricted to the resolvable scales of the data, sidestepping the under-determination that plagues full-grid 1D inversions of single-shot data. §6.6 will return to FULL-grid inversion with multi-parameter coupling.

The classical FWI iteration

Pseudocode for one outer iteration of classical full-waveform inversion:

inputs: (c₁, c₂, c₃) (current layer velocities), d_obs (recorded data) output: (c₁_new, c₂_new, c₃_new)

Build c(x): c(x) ← c₁ w₁(x) + c₂ w₂(x) + c₃ w₃(x)
Forward solve: u ← solve u_tt = c(x)² u_xx + s (FDTD)
Sample receivers: d_pred ← u(x_rec_r, t), for r = 1…6
Residuals: r_r ← d_pred_r − d_obs_r
Misfit: J ← (1/2) Σ_r Σ_n r_r(t_n)² · dt
Adjoint solve: λ ← solve λ_tt = c² λ_xx + Σ_r r_r·δ_rec_r (backwards in time)
Grid gradient: g(x) ← −(2/c³) Σ_t u_tt(x,t) · λ(x,t) · dt
Project to layers: g_k ← ∫ g(x) · w_k(x) dx, for k = 1, 2, 3
Line search: step ← largest η for which J(c − η ĝ) < J(c) (ĝ = g normalised to max |ĝ_k| = 1)
Update: c_k_new ← c_k − step · ĝ_k
Steps 1-6 are exactly the §6.1 widget. Steps 7-9 turn one gradient into a velocity update. The widget below chains 25 of these outer iterations.

Implementation notes

Layer parameterisation. The 3-vector $(c_1, c_2, c_3)$ is the unknown. Reduces 161 grid degrees of freedom to 3. For §6.2 we further freeze $c_1$ and $c_3$ at their (assumed-known) check-shot values and invert only $c_2$ , the cleanest single-unknown demonstration. With multiple unknown layers, single-shot data exhibits inter-parameter trade-offs that need multi-shot illumination to break (§6.6). Production FWI sometimes uses similar freezes when shallow-layer velocities are known from refraction surveys.
Six receivers. Spread along $x \in {0.20, 0.35, 0.50, 0.65, 0.80, 0.95}$ . Each contributes a residual term to $J$ ; the adjoint solve sums their $\delta$ -source contributions. Multi-receiver = multiple constraints on the layer parameters.
Line search. Backtracking: try $\eta = \eta_0$ ; if it does not reduce $J$ , halve and retry, up to 10 times. On success, the next iteration starts from $1.6 \times$ the accepted step (gentle growth). Métivier & Brossier (2016) describe the same scheme in the SEISCOPE toolbox.
Step normalisation. The 3-component layer gradient is normalised to $\max |g_k| = 1$ before the line search. Then $\eta$ has the physical meaning "max layer-velocity perturbation per iteration", typically $\eta_0 = 0.10$ at start.
Velocity clamps. $c_k \in [0.5, 2.5]$ are enforced after each update. Real FWI uses physically motivated bounds (e.g.\ $1500 \le v_p \le 7000$ m/s); the same idea, narrower range.
What "convergence" means here. The widget measures relative-L² of $(c_1, c_2, c_3) - (1.0, 1.5, 1.2)$ vs the truth. A good run gets this to $< 2%$ . A run starting from a bad starting model gets stuck in a cycle-skipping island and stalls at 10-20% error.

Try it

What you should see

Pick $c_2 = 2.0$ and click ▶ Run. The line search takes 25 iterations, in the browser, about 8-15 seconds depending on hardware. Each iteration runs three FDTD solves (forward, adjoint, plus one trial in the line search), so the cost is dominated by FDTD time-stepping, not by anything PINN-related.

Expected behaviour:

Good starting models $(c_2 \in {1.0, 1.2, 1.8, 2.0, 2.1})$ : misfit $J$ drops by $10$ - $100\times$ and the velocity profile converges to within $< 5%$ of $c_{\mathrm{true}}$ . The colour ramp on the velocity panel goes orange (start) → cyan (final), and the cyan curve overlays the gray $c_{\mathrm{true}}$ line.
Marginal starting models $(c_2 \in {0.7\text{--}0.9})$ : convergence is slower, may stall around 10-20% model error. The receiver fit improves but does not close completely.
Cycle-skipped starting models $(c_2 \in {0.6, 2.3\text{--}2.4})$ : the gradient sign is wrong, descent makes things WORSE, the line search fails repeatedly, and the inversion stalls at the wrong local minimum. This is exactly the §6.1 cycle-skipping warning made concrete: the optimiser does not magically find the global minimum; it finds the closest local one.

Cycle skipping is the single biggest practical obstacle in FWI. §6.4 (frequency continuation) and §6.5 (envelope and Wasserstein misfits) build the standard remedies. They both START from this iteration loop, they just change the misfit $J$ or the data filtering before each iteration. The descent step itself is unchanged.

The PINN-FWI version of this exact iteration

Conceptually identical, structurally different. PINN-FWI replaces this entire loop with a single Adam optimiser over $(\theta_u, \theta_m)$ :

for epoch in range(N_epochs): L_data ← Σ (u_NN(x_rec, t_n) − d_obs(x_rec, t_n))² L_pde ← Σ (∂t² u_NN − m_NN(x)² ∂x² u_NN)² L_ic ← Σ (u_NN(x, 0) − u_init(x))² L_bc ← Σ (u_NN(x_b, t))² L ← λ_d L_data + λ_p L_pde + λ_i L_ic + λ_b L_bc grad_θ_u, grad_θ_m ← backprop(L) θ_u ← Adam(θ_u, grad_θ_u) θ_m ← Adam(θ_m, grad_θ_m)

Where the classical version makes 25 outer iterations of (forward + adjoint + line search), the PINN version makes 3000-5000 epochs of (forward+backward through the joint loss). The classical iteration converges in < 30 outer iterations because the gradient is exact; the PINN iteration takes thousands of epochs because the wavefield network $u$ {\mathrm{NN}} $u_{NN}$ has to LEARN to satisfy the PDE while $m$ {\mathrm{NN}} $m_{NN}$ is moving underneath it (the "coupled convergence" weakness of §6.1). On this 1D toy, both eventually reach comparable model error; on Marmousi-class problems the gap widens dramatically in favour of classical FWI.

The PINN-FWI weight-balance question is taken up in §6.8, and §6.5 demonstrates how the misfit landscape itself differs (L² vs envelope), both questions apply identically to classical FWI and to PINN-FWI.

References

Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics 49(8), 1259-1266.
Plessix, R.-E. (2006). A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 167(2), 495-503.
Métivier, L., Brossier, R. (2016). The SEISCOPE optimization toolbox: A large-scale nonlinear optimization library based on reverse communication. Geophysics 81(2), F1-F15. The line-search implementation reference.
Virieux, J., Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics 74(6), WCC1-WCC26.
Pratt, R.G. (1999). Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a physical scale model. Geophysics 64(3), 888-901. The frequency-domain analogue of the time-domain iteration here.