Source-encoded FWI-PINN
Learning objectives
- Understand stochastic source encoding (Krebs 2009)
- See empirically that encoded inversion converges with ~N× less compute
- Recognise the noise-vs-compute trade-off (encoded gradients are noisy)
- Connect to mini-batch SGD: source encoding is FWI's mini-batch trick
Real seismic surveys have hundreds to thousands of shots. Naive FWI runs ONE forward + ONE adjoint solve PER SHOT per outer iteration. With 1000 shots and 30 iterations, that is 60,000 PDE solves total. On a 2D Marmousi-class problem, that is days of CPU time on a small cluster.
Krebs et al. (2009) showed this can be reduced to PDE solves total — a 2000× saving — via STOCHASTIC SOURCE ENCODING. The idea: combine all shots into a single random superposition; run ONE forward + ONE adjoint solve; compute the gradient as if the encoded super-shot were a single shot. The expected gradient over random encoding signs equals the shot-by-shot gradient. The variance of the encoded gradient is the cost paid for the compute saving.
The math
Let be the source for shot , be its recorded data at the receivers. Choose random encoding signs independently and uniformly, fresh each iteration. Define the encoded super-source and encoded data:
Run ONE forward solve with on the current model to get . Compute the residual . Run ONE adjoint solve. Apply the Plessix correlation. Result: an unbiased estimate of the shot-by-shot FWI gradient.
Why unbiased? The are independent zero-mean. Cross-terms in the encoded gradient (between shot and shot ) carry , which has expectation zero. Diagonal terms () carry deterministically. Sum over : same as the shot-by-shot diagonal sum.
The cost: the cross-terms ARE present in any single iteration; they form the gradient noise. As noise relative to the signal, this is exactly the same trade-off as mini-batch SGD vs full-batch gradient descent in deep learning.
Try it
The widget runs 4-shot FWI on the §6.2 problem with as the only unknown. Shot-by-shot does 4 forward + 4 adjoint solves per outer iteration. Encoded does 1 forward + 1 adjoint with random ±1 signs flipping each iteration. With 12 outer iterations:
- Shot-by-shot: 12 × 4 × 2 = 96 PDE solves total. Smooth convergence trace.
- Encoded: 12 × 1 × 2 = 24 PDE solves total. Noisier trace but reaches truth.
For 4 shots the saving is 4×. For 1000-shot real seismic the saving is 1000×, and a single 30-iteration FWI takes 30 minutes instead of 30 hours. Production codes use a more conservative scheme: 4–8 random encodings averaged per outer iteration to reduce noise (still a saving for 1000 shots).
Variants and refinements
- Time-shift encoding (Krebs 2009 Appendix). Instead of ±1 binary, use random time-shifts per shot. This delocalises the cross-talk noise in TIME rather than in amplitude.
- Frequency-domain encoding (Plessix & Mulder 2008). Encode in the Fourier domain — random phase rotations per shot per frequency. Equivalent to mini-batch in frequency-domain FWI.
- Plane-wave decomposition. Convert the source line into plane-wave shots; invert plane waves instead of point shots. Naturally encodes via the plane-wave parameter.
- SGD vs Adam-style averaging. Moghaddam et al. 2013 propose averaging encoded gradients over a window of past iterations — the encoded version of momentum. Convergence is smoother.
PINN-FWI source encoding
The PINN-FWI version is structurally trivial: the data-fit term is summed over all shots. Replace it with the encoded version:
where the wavefield network is now trained against the encoded super-shot data. Auto-diff handles backprop through both networks. Adam updates and with the noisy encoded gradient. With fresh each epoch, the noise averages out. Sun & Alkhalifah have demonstrated this for 2D Marmousi-class PINN-FWI at 5–10× compute savings vs shot-by-shot PINN-FWI.
What §6.8 will do
§6.8 returns to the loss-balance question of §3.2 in an FWI setting. The PINN-FWI joint loss has multiple weights that strongly affect convergence. The §6.8 widget demonstrates the simplest 2-term version on classical FWI (data misfit + Tikhonov regulariser) so the trade-off is visible in a single 80-sample misfit-landscape sweep. The full PINN-FWI version inherits all the same balancing concerns — covered in the §6.8 prose with cross-references to the §3.3 NTK-balance and §3.4 SA-PINN auto-tuning trio.
References
- Krebs, J.R., Anderson, J.E., Hinkley, D., Neelamani, R., Lee, S., Baumstein, A., Lacasse, M.-D. (2009). Fast full-wavefield seismic inversion using encoded sources. Geophysics 74(6), WCC177–WCC188.
- Moghaddam, P.P., Keers, H., Herrmann, F.J., Mulder, W.A. (2013). A new optimization approach for source-encoding full-waveform inversion. Geophysics 78(3), R125–R132.
- Plessix, R.-E., Mulder, W.A. (2008). Source separation in seismic full-waveform inversion using random Krylov methods. SEG Annual Meeting.