Multi-scale frequency continuation

Part 6 — Velocity inversion with PINNs

Learning objectives

  • Understand the Bunks 1995 frequency-continuation curriculum
  • See empirically that low-frequency data has a wider cycle-skipping basin
  • Race classical single-frequency FWI against a 3-stage frequency-continuation curriculum
  • Recognise this as the standard cure for cycle skipping in production FWI
  • Connect to the §3.6 PINN curriculum-learning section

§6.2 closed the FWI loop. §6.3 introduced the Marmousi benchmark. Both pointed at the same problem: when the starting model is far from truth, the data-fit gradient cycle-skips. Frequency continuation is the textbook cure. It was independently rediscovered in geophysics (Bunks et al. 1995) and atmospheric tomography (Pratt 1999), and by 2010 it was the default FWI pre-conditioner in every production code.

The idea

Cycle skipping happens when the predicted-vs-observed traveltime mismatch exceeds half the source's dominant period (§6.1, §6.5). For a Ricker source with peak frequency f0f_0, descent stays inside the convex basin only while Δt<1/(2f0)|\Delta t| < 1/(2 f_0). So a LOWER-frequency source (smaller f0f_0, larger Δt\Delta t tolerance) has a WIDER basin of attraction.

Bunks 1995 turned this observation into a curriculum:

  • Stage 1 (low-freq). Filter the recorded data to its lowest-frequency content. Run FWI from the smooth starting model. Wide basin → converges to a low-resolution velocity model that explains the low-frequency data.
  • Stage 2 (mid-freq). Filter to mid-band, restart from stage-1's converged model. The narrow stage-2 basin is now wide enough relative to the stage-1 model error → converges, adds mid-spatial-scale structure.
  • Stage 3 (high-freq). Use full-band data, restart from stage-2's converged model. Stage-3 basin is narrow but the stage-2 model is already inside it → converges to high-resolution truth.

This is exactly the same idea as the §3.6 curriculum-learning section for PINN training (start with ω=2\omega = 2, then 44, then 88, etc.). Bunks 1995 was first; the PINN community rediscovered it in 2021 (Krishnapriyan et al. NeurIPS).

How the widget filters frequency

In production FWI, low-pass filters are applied to the recorded data and the source signature. Here we synthesise data on the fly, so we equivalently change the SOURCE PULSE WIDTH:

  • Stage 1 (low): Ricker with σ=0.080\sigma = 0.080 s. Peak frequency f04f_0 \approx 4 Hz, half-period basin 126\sim 126 ms.
  • Stage 2 (mid): σ=0.030\sigma = 0.030 s. f011f_0 \approx 11 Hz, half-period basin 47\sim 47 ms.
  • Stage 3 (high): σ=0.012\sigma = 0.012 s — same as §6.2. f026f_0 \approx 26 Hz, half-period basin 19\sim 19 ms.

Each stage runs 8 FWI iterations (the §6.2 recipe with Plessix gradient, partition-of-unity layer parameterisation, line search, c₁/c₃ frozen). The next stage starts from the previous stage's (c1,c2,c3)(c_1, c_2, c_3). Total cost: 3×8=243 \times 8 = 24 outer iterations.

Try it: race

Freq ContinuationInteractive figure — enable JavaScript to interact.

Pick c2=2.15c_2 = 2.15 to start (just inside the cycle-skipped boundary from §6.1's deep QA). Click ▶ Race. Two inversions run in series:

  • Single-frequency baseline: 25 iterations of §6.2-style FWI with the high-frequency source (σ=0.012\sigma = 0.012). Expected behaviour: the line search stalls — gradient sign wrong in the cycle-skipped basin.
  • Frequency continuation: 24 iterations across 3 stages. Expected behaviour: stage 1 (low-freq) walks c2c_2 from 2.30 toward 1.5, stage 2 sharpens, stage 3 finishes the job. Final c21.5c_2 \approx 1.5.

The c₂-trace panel shows both runs. Vertical dashed lines mark stage transitions in the freq-continuation curve. Watch the shape: the curriculum walks slowly during stage 1 (large updates from the wide basin) then refines progressively in stages 2 and 3. The single-frequency curve barely moves — its first iteration takes a small step in the wrong direction and then the line search fails.

Why it works mathematically

The FWI misfit J(m)=12F(m)d2J(m) = \tfrac{1}{2} \int |F(m) - d|^2 is non-convex. Its global minimum is at m=mtruem = m_{\mathrm{true}}. Around the global minimum, locally JJ is approximately quadratic with a Hessian that depends on the data spectrum. The width of the convex basin around mtruem_{\mathrm{true}} scales with 1/ωmax1/\omega_{\mathrm{max}} where ωmax\omega_{\mathrm{max}} is the highest frequency in the data. Lower ωmax\omega_{\mathrm{max}} → wider basin.

Frequency continuation exploits this directly. By starting with low-ωmax\omega_{\mathrm{max}} data, you make the optimisation problem mostly convex around your starting model. As you converge and progressively add higher frequencies, you land closer and closer to mtruem_{\mathrm{true}}, where the basin around truth is wider than the model error you have left to correct.

This is the same logic as the §3.6 PINN curriculum but applied to the data filtering rather than the loss target. The two ideas compose: production PINN-FWI uses BOTH a frequency-continuation curriculum on the data AND an architectural curriculum on uNNu_{\mathrm{NN}}'s representational capacity.

Production refinements (not in this widget)

  • Optimal frequency selection. Sirgue & Pratt 2004 derived the optimal frequency spacing for FWI: each new frequency should EXACTLY cover the spatial-scale gap left by the previous stage. The number of stages is log2(ωmax/ωmin)\log_2(\omega_{\max}/\omega_{\min}).
  • Per-stage convergence criteria. Production codes do not run a fixed iteration count per stage; they stop on J<τ|\nabla J| < \tau or ΔJ/J<ϵ\Delta J / J < \epsilon. This widget uses a fixed budget for clarity.
  • Frequency-domain FWI. Pratt 1998, Operto 2007: do FWI directly in the frequency domain, one frequency at a time. Cheaper per stage, more elegant; but harder to integrate with time-domain uttu_{tt}-style adjoints.
  • Time-domain FWI with mute windows. Time-windowed FWI applies frequency continuation IN TIME (early arrivals first, late arrivals later) rather than in spectrum. Equivalent to spectral filtering when phase information dominates.

What §6.5 will do

Frequency continuation defeats most cycle-skipping cases. But not all — some starting models are SO far from truth that even a 1 Hz Ricker cannot bridge the gap. §6.5 introduces the modern FWI alternative misfits — envelope-based (Wu et al. 2014), Wasserstein (Engquist & Froese 2014), adaptive waveform inversion (Warner & Guasch 2016) — each of which converts the L2L^2 non-convexity into a convexity-preserving distance.

References

  • Bunks, C., Saleck, F.M., Zaleski, S., Chavent, G. (1995). Multiscale seismic waveform inversion. Geophysics 60(5), 1457–1473. The original time-domain frequency-continuation paper.
  • Pratt, R.G. (1999). Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a physical scale model. Geophysics 64(3), 888–901.
  • Sirgue, L., Pratt, R.G. (2004). Efficient waveform inversion and imaging: A strategy for selecting temporal frequencies. Geophysics 69(1), 231–248. Optimal-frequency-selection theorem.
  • Brossier, R., Operto, S., Virieux, J. (2009). Seismic imaging of complex onshore structures by 2D elastic frequency-domain full-waveform inversion. Geophysics 74(6), WCC105–WCC118.
  • Krishnapriyan, A. et al. (2021). Characterizing possible failure modes in physics-informed neural networks. NeurIPS. The §3.6 PINN curriculum-learning paper.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.