Multi-scale frequency continuation

Part 6 — Velocity inversion with PINNs

Learning objectives

Understand the Bunks 1995 frequency-continuation curriculum
See empirically that low-frequency data has a wider cycle-skipping basin
Race classical single-frequency FWI against a 3-stage frequency-continuation curriculum
Recognise this as the standard cure for cycle skipping in production FWI
Connect to the §3.6 PINN curriculum-learning section

§6.2 closed the FWI loop. §6.3 introduced the Marmousi benchmark. Both pointed at the same problem: when the starting model is far from truth, the data-fit gradient cycle-skips. Frequency continuation is the textbook cure. It was independently rediscovered in geophysics (Bunks et al. 1995) and atmospheric tomography (Pratt 1999), and by 2010 it was the default FWI pre-conditioner in every production code.

The idea

Cycle skipping happens when the predicted-vs-observed traveltime mismatch exceeds half the source's dominant period (§6.1, §6.5). For a Ricker source with peak frequency $f_0$ , descent stays inside the convex basin only while $|\Delta t| < 1/(2 f_0)$ . So a LOWER-frequency source (smaller $f_0$ , larger $\Delta t$ tolerance) has a WIDER basin of attraction.

Bunks 1995 turned this observation into a curriculum:

Stage 1 (low-freq). Filter the recorded data to its lowest-frequency content. Run FWI from the smooth starting model. Wide basin → converges to a low-resolution velocity model that explains the low-frequency data.
Stage 2 (mid-freq). Filter to mid-band, restart from stage-1's converged model. The narrow stage-2 basin is now wide enough relative to the stage-1 model error → converges, adds mid-spatial-scale structure.
Stage 3 (high-freq). Use full-band data, restart from stage-2's converged model. Stage-3 basin is narrow but the stage-2 model is already inside it → converges to high-resolution truth.

This is exactly the same idea as the §3.6 curriculum-learning section for PINN training (start with $\omega = 2$ , then $4$ , then $8$ , etc.). Bunks 1995 was first; the PINN community rediscovered it in 2021 (Krishnapriyan et al. NeurIPS).

In production FWI, low-pass filters are applied to the recorded data and the source signature. Here we synthesise data on the fly, so we equivalently change the SOURCE PULSE WIDTH:

Stage 1 (low): Ricker with $\sigma = 0.080$ s. Peak frequency $f_0 \approx 4$ Hz, half-period basin $\sim 126$ ms.
Stage 2 (mid): $\sigma = 0.030$ s. $f_0 \approx 11$ Hz, half-period basin $\sim 47$ ms.
Stage 3 (high): $\sigma = 0.012$ s — same as §6.2. $f_0 \approx 26$ Hz, half-period basin $\sim 19$ ms.

Each stage runs 8 FWI iterations (the §6.2 recipe with Plessix gradient, partition-of-unity layer parameterisation, line search, c₁/c₃ frozen). The next stage starts from the previous stage's $(c_1, c_2, c_3)$ . Total cost: $3 \times 8 = 24$ outer iterations.

Try it: race

Pick $c_2 = 2.15$ to start (just inside the cycle-skipped boundary from §6.1's deep QA). Click ▶ Race. Two inversions run in series:

Single-frequency baseline: 25 iterations of §6.2-style FWI with the high-frequency source ( $\sigma = 0.012$ ). Expected behaviour: the line search stalls — gradient sign wrong in the cycle-skipped basin.
Frequency continuation: 24 iterations across 3 stages. Expected behaviour: stage 1 (low-freq) walks $c_2$ from 2.30 toward 1.5, stage 2 sharpens, stage 3 finishes the job. Final $c_2 \approx 1.5$ .

The c₂-trace panel shows both runs. Vertical dashed lines mark stage transitions in the freq-continuation curve. Watch the shape: the curriculum walks slowly during stage 1 (large updates from the wide basin) then refines progressively in stages 2 and 3. The single-frequency curve barely moves — its first iteration takes a small step in the wrong direction and then the line search fails.

Why it works mathematically

The FWI misfit $J(m) = \tfrac{1}{2} \int |F(m) - d|^2$ is non-convex. Its global minimum is at $m = m_{\mathrm{true}}$ . Around the global minimum, locally $J$ is approximately quadratic with a Hessian that depends on the data spectrum. The width of the convex basin around $m_{\mathrm{true}}$ scales with $1/\omega_{\mathrm{max}}$ where $\omega_{\mathrm{max}}$ is the highest frequency in the data. Lower $\omega_{\mathrm{max}}$ → wider basin.

Frequency continuation exploits this directly. By starting with low- $\omega_{\mathrm{max}}$ data, you make the optimisation problem mostly convex around your starting model. As you converge and progressively add higher frequencies, you land closer and closer to $m_{\mathrm{true}}$ , where the basin around truth is wider than the model error you have left to correct.

This is the same logic as the §3.6 PINN curriculum but applied to the data filtering rather than the loss target. The two ideas compose: production PINN-FWI uses BOTH a frequency-continuation curriculum on the data AND an architectural curriculum on $u_{\mathrm{NN}}$ 's representational capacity.

Optimal frequency selection. Sirgue & Pratt 2004 derived the optimal frequency spacing for FWI: each new frequency should EXACTLY cover the spatial-scale gap left by the previous stage. The number of stages is $\log_2(\omega_{\max}/\omega_{\min})$ .
Per-stage convergence criteria. Production codes do not run a fixed iteration count per stage; they stop on $|\nabla J| < \tau$ or $\Delta J / J < \epsilon$ . This widget uses a fixed budget for clarity.
Frequency-domain FWI. Pratt 1998, Operto 2007: do FWI directly in the frequency domain, one frequency at a time. Cheaper per stage, more elegant; but harder to integrate with time-domain $u_{tt}$ -style adjoints.
Time-domain FWI with mute windows. Time-windowed FWI applies frequency continuation IN TIME (early arrivals first, late arrivals later) rather than in spectrum. Equivalent to spectral filtering when phase information dominates.

What §6.5 will do

Frequency continuation defeats most cycle-skipping cases. But not all — some starting models are SO far from truth that even a 1 Hz Ricker cannot bridge the gap. §6.5 introduces the modern FWI alternative misfits — envelope-based (Wu et al. 2014), Wasserstein (Engquist & Froese 2014), adaptive waveform inversion (Warner & Guasch 2016) — each of which converts the $L^2$ non-convexity into a convexity-preserving distance.

References

Bunks, C., Saleck, F.M., Zaleski, S., Chavent, G. (1995). Multiscale seismic waveform inversion. Geophysics 60(5), 1457–1473. The original time-domain frequency-continuation paper.
Pratt, R.G. (1999). Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a physical scale model. Geophysics 64(3), 888–901.
Sirgue, L., Pratt, R.G. (2004). Efficient waveform inversion and imaging: A strategy for selecting temporal frequencies. Geophysics 69(1), 231–248. Optimal-frequency-selection theorem.
Brossier, R., Operto, S., Virieux, J. (2009). Seismic imaging of complex onshore structures by 2D elastic frequency-domain full-waveform inversion. Geophysics 74(6), WCC105–WCC118.
Krishnapriyan, A. et al. (2021). Characterizing possible failure modes in physics-informed neural networks. NeurIPS. The §3.6 PINN curriculum-learning paper.