Source-encoded FWI-PINN

Part 6 — Velocity inversion with PINNs

Learning objectives

  • Understand stochastic source encoding (Krebs 2009)
  • See empirically that encoded inversion converges with ~N× less compute
  • Recognise the noise-vs-compute trade-off (encoded gradients are noisy)
  • Connect to mini-batch SGD: source encoding is FWI's mini-batch trick

Real seismic surveys have hundreds to thousands of shots. Naive FWI runs ONE forward + ONE adjoint solve PER SHOT per outer iteration. With 1000 shots and 30 iterations, that is 60,000 PDE solves total. On a 2D Marmousi-class problem, that is days of CPU time on a small cluster.

Krebs et al. (2009) showed this can be reduced to 30\sim 30 PDE solves total — a 2000× saving — via STOCHASTIC SOURCE ENCODING. The idea: combine all shots into a single random superposition; run ONE forward + ONE adjoint solve; compute the gradient as if the encoded super-shot were a single shot. The expected gradient over random encoding signs equals the shot-by-shot gradient. The variance of the encoded gradient is the cost paid for the compute saving.

The math

Let sk(x,t)=δ(xxk)fk(t)s_k(x, t) = \delta(x - x_k) f_k(t) be the source for shot kk, dkobs(t)d_{k}^{\mathrm{obs}}(t) be its recorded data at the receivers. Choose random encoding signs εk{1,+1}\varepsilon_k \in {-1, +1} independently and uniformly, fresh each iteration. Define the encoded super-source and encoded data:

senc(x,t)=kεksk(x,t),dencobs(t)=kεkdkobs(t).s_{\mathrm{enc}}(x, t) = \sum_k \varepsilon_k \, s_k(x, t) , \qquad d_{\mathrm{enc}}^{\mathrm{obs}}(t) = \sum_k \varepsilon_k \, d_{k}^{\mathrm{obs}}(t) .

Run ONE forward solve with sencs_{\mathrm{enc}} on the current model to get upredencu_{\mathrm{pred}}^{\mathrm{enc}}. Compute the residual renc=upredencrecdencobsr_{\mathrm{enc}} = u_{\mathrm{pred}}^{\mathrm{enc}}|{\mathrm{rec}} - d{\mathrm{enc}}^{\mathrm{obs}}. Run ONE adjoint solve. Apply the Plessix correlation. Result: an unbiased estimate of the shot-by-shot FWI gradient.

Why unbiased? The εk\varepsilon_k are independent zero-mean. Cross-terms in the encoded gradient (between shot kk and shot k\ell \ne k) carry εkε\varepsilon_k \varepsilon_\ell, which has expectation zero. Diagonal terms (k=k = \ell) carry εk2=1\varepsilon_k^2 = 1 deterministically. Sum over kk: same as the shot-by-shot diagonal sum.

The cost: the cross-terms ARE present in any single iteration; they form the gradient noise. As N\sqrt{N} noise relative to the signal, this is exactly the same trade-off as mini-batch SGD vs full-batch gradient descent in deep learning.

Try it

Source EncodingInteractive figure — enable JavaScript to interact.

The widget runs 4-shot FWI on the §6.2 problem with c2c_2 as the only unknown. Shot-by-shot does 4 forward + 4 adjoint solves per outer iteration. Encoded does 1 forward + 1 adjoint with random ±1 signs flipping each iteration. With 12 outer iterations:

  • Shot-by-shot: 12 × 4 × 2 = 96 PDE solves total. Smooth convergence trace.
  • Encoded: 12 × 1 × 2 = 24 PDE solves total. Noisier trace but reaches truth.

For 4 shots the saving is 4×. For 1000-shot real seismic the saving is 1000×, and a single 30-iteration FWI takes 30 minutes instead of 30 hours. Production codes use a more conservative scheme: 4–8 random encodings averaged per outer iteration to reduce noise (still a 100×100\times saving for 1000 shots).

Variants and refinements

  • Time-shift encoding (Krebs 2009 Appendix). Instead of ±1 binary, use random time-shifts τk\tau_k per shot. This delocalises the cross-talk noise in TIME rather than in amplitude.
  • Frequency-domain encoding (Plessix & Mulder 2008). Encode in the Fourier domain — random phase rotations per shot per frequency. Equivalent to mini-batch in frequency-domain FWI.
  • Plane-wave decomposition. Convert the source line into plane-wave shots; invert plane waves instead of point shots. Naturally encodes via the plane-wave parameter.
  • SGD vs Adam-style averaging. Moghaddam et al. 2013 propose averaging encoded gradients over a window of past iterations — the encoded version of momentum. Convergence is smoother.

PINN-FWI source encoding

The PINN-FWI version is structurally trivial: the data-fit term Ldata\mathcal{L}_{\mathrm{data}} is summed over all shots. Replace it with the encoded version:

Ldataenc(θu)=(uNNrec,tkεkdkobs)2,\mathcal{L}_{\mathrm{data}}^{\mathrm{enc}}(\theta_u) = \Bigl( u_{\mathrm{NN}}|_{\mathrm{rec}, \, t} - \sum_k \varepsilon_k d_k^{\mathrm{obs}} \Bigr)^2 ,

where the wavefield network uNNu_{\mathrm{NN}} is now trained against the encoded super-shot data. Auto-diff handles backprop through both networks. Adam updates θu\theta_u and θm\theta_m with the noisy encoded gradient. With fresh εk\varepsilon_k each epoch, the noise averages out. Sun & Alkhalifah have demonstrated this for 2D Marmousi-class PINN-FWI at 5–10× compute savings vs shot-by-shot PINN-FWI.

What §6.8 will do

§6.8 returns to the loss-balance question of §3.2 in an FWI setting. The PINN-FWI joint loss L=λdLdata+λpLpde+\mathcal{L} = \lambda_d \mathcal{L}{\mathrm{data}} + \lambda_p \mathcal{L}{\mathrm{pde}} + \ldots has multiple weights that strongly affect convergence. The §6.8 widget demonstrates the simplest 2-term version on classical FWI (data misfit + Tikhonov regulariser) so the trade-off is visible in a single 80-sample misfit-landscape sweep. The full PINN-FWI version inherits all the same balancing concerns — covered in the §6.8 prose with cross-references to the §3.3 NTK-balance and §3.4 SA-PINN auto-tuning trio.

References

  • Krebs, J.R., Anderson, J.E., Hinkley, D., Neelamani, R., Lee, S., Baumstein, A., Lacasse, M.-D. (2009). Fast full-wavefield seismic inversion using encoded sources. Geophysics 74(6), WCC177–WCC188.
  • Moghaddam, P.P., Keers, H., Herrmann, F.J., Mulder, W.A. (2013). A new optimization approach for source-encoding full-waveform inversion. Geophysics 78(3), R125–R132.
  • Plessix, R.-E., Mulder, W.A. (2008). Source separation in seismic full-waveform inversion using random Krylov methods. SEG Annual Meeting.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.