Cycle skipping: detection and remedies

Part 6 — Velocity inversion with PINNs

Learning objectives

  • Define cycle skipping in terms of the half-period traveltime threshold
  • Recognise cycle skipping in convergence traces and gradient sign
  • Understand the three modern alternative misfits: envelope, Wasserstein, AWI
  • See empirically that envelope misfit cures cycle skipping at the cost of resolution
  • Know the production-FWI workflow: envelope → L² (multi-stage misfit continuation)

§6.4 cured cycle skipping by changing the DATA (low-pass filtering). This section cures it by changing the MISFIT FUNCTION. Both ideas compose in production: filter the data AND use a robust misfit, layered curriculum-style.

What cycle skipping looks like

The §6.1 widget showed the gradient sign flipping at extreme c₂ values. The §6.2 widget showed the line search stalling. The §6.4 widget showed the freq-continuation rescue. Here is the canonical signature in three pictures:

  • Predicted vs observed seismograms. The two waveforms look very similar in shape but one is shifted in time by more than half the dominant period. Visually: peaks of dpredd_{\mathrm{pred}} align with TROUGHS of dobsd_{\mathrm{obs}}.
  • The L² residual r=dpreddobsr = d_{\mathrm{pred}} - d_{\mathrm{obs}}. Has a "double pulse" shape — positive lobe at the predicted arrival, negative lobe at the observed arrival, both of comparable magnitude. The gradient J/m\partial J/\partial m correlates this residual with the time-reversed adjoint, giving CONSTRUCTIVE cancellation between the two lobes — wrong sign.
  • Convergence trace. JJ stalls at a non-zero plateau. The line search reduces step size to zero and gives up. The model error cctrue|c - c_{\mathrm{true}}| stays at its initial value (or grows).

The mathematical condition

Let T0=1/f0T_0 = 1/f_0 be the dominant period of the source. Let Δt\Delta t be the traveltime mismatch between the predicted and observed arrivals at a receiver. Cycle skipping happens when

Δt>T0/2.|\Delta t| > T_0 / 2 .

For a Ricker pulse with σ\sigma-parameterisation s(t)=(12u2)eu2s(t) = (1 - 2u^2) e^{-u^2}, u=(tt0)/σu = (t-t_0)/\sigma, the dominant frequency is f0=1/(πσ)f_0 = 1/(\pi \sigma) and the period is T0=πσT_0 = \pi \sigma. So the cycle-skipping threshold is

Δt>πσ/2.|\Delta t| > \pi \sigma / 2 .

For our §6.4 widget: σ=0.012\sigma = 0.012 s, so the threshold is 19\sim 19 ms. A 1D problem with travel-time differences exceeding 19 ms across the model is cycle-skipped at single-frequency.

Three modern alternative misfits

The trick: replace the L2L^2 misfit J=12(dpreddobs)2J = \tfrac{1}{2} \int (d_{\mathrm{pred}} - d_{\mathrm{obs}})^2 with something that is SMOOTH in time-shift, so the cycle-skipping non-convexity disappears. Three contenders:

  • Envelope misfit (Wu, Luo, Wu 2014). Replace each seismogram by its envelope env(s)(t)=s(t)2+ϵ2\mathrm{env}(s)(t) = \sqrt{s(t)^2 + \epsilon^2} before computing the residual. The envelope strips the carrier oscillations — peaks and troughs become bumps — so cycle-skipping is impossible at the envelope level. The price: resolution loss; the envelope smooths over the high-frequency content that distinguishes nearby reflectors. Production workflow: invert the envelope first, then refine with L².
  • Wasserstein / optimal-transport misfit (Engquist & Froese 2014; Métivier et al. 2016). Treat the seismograms as probability distributions (after positivity-shift normalisation) and compute the Wasserstein-1 or Wasserstein-2 distance between them. The Wasserstein metric is convex in time-shift by construction, so it cures cycle skipping cleanly. Mathematically elegant; computationally intensive — Métivier 2016's SOT (Sliced Optimal Transport) approximation made it production-tractable.
  • Adaptive Waveform Inversion (AWI) (Warner & Guasch 2016). Instead of comparing dpredd_{\mathrm{pred}} to dobsd_{\mathrm{obs}} directly, find the Wiener filter ww that best matches dpredwd_{\mathrm{pred}} \ast w to dobsd_{\mathrm{obs}}, then minimise the deviation of ww from a δ\delta-function. AWI absorbs time-shift residuals into the filter ww; the misfit only sees what is left. Provably immune to first-order cycle skipping; a workhorse misfit at the major service companies (CGG, BP, etc.).

Try it: misfit landscapes

Cycle Skip RemediesInteractive figure — enable JavaScript to interact.

The widget above computes both misfits as a function of the middle-layer velocity c2c_2 on the §6.2 problem, with c1=1.0c_1 = 1.0 and c3=1.2c_3 = 1.2 fixed at truth. 60 sampled values of c2[0.6,2.4]c_2 \in [0.6, 2.4] each take one FDTD forward solve. Plotted side-by-side on log y-axis, with local minima circled.

Empirical result on this problem (60-sample landscape):

  • L² landscape: typically 10 local minima — one near c2=1.5c_2 = 1.5 (truth) plus 9 spurious minima at multiples of the Ricker period offset. Each spurious minimum is a "cycle-skipping island" where the predicted-vs-observed traveltime mismatch is an integer number of half-periods. Gradient descent converges to whichever island the starting model lies in.
  • Envelope landscape: 1 minimum at c2=1.5c_2 = 1.5. The smoothed-s2s^2 envelope smooth(s2,σ)+ϵ2\sqrt{\mathrm{smooth}(s^2, \sigma) + \epsilon^2} removes the carrier oscillations that produce the L² wiggles. The result is a smooth bowl with a global minimum at truth.

This is the Wu, Luo, Wu 2014 figure-of-merit, made interactive. Their published version uses Hilbert-transform envelopes (computed via FFT) which produce essentially the same convex landscape.

Why we don't run iterative descent in this widget

You might expect a "race two FWI runs" widget like §6.4's. The mathematical claim of the convex landscape is that envelope-FWI gradient descent converges from any starting point. In practice, the simple s2+ϵ2\sqrt{s^2 + \epsilon^2} envelope has a known limitation: the gradient adjoint source

adj_src(t)=spred(t)smooth ⁣(envpenvoenvp,σ)(t)\textrm{adj\_src}(t) = s_{\mathrm{pred}}(t) \cdot \textrm{smooth}\!\left(\frac{\mathrm{env}_p - \mathrm{env}_o}{\mathrm{env}_p}, \sigma\right)(t)

still contains the carrier spred(t)s_{\mathrm{pred}}(t) as a multiplicative factor, so the adjoint wavefield λ(x,t)\lambda(x, t) inherits the same phase oscillations as L²'s. The correlation uttλdt\int u_{tt} , \lambda , dt then suffers the same phase-cancellation issues at the gradient level even though the misfit landscape is convex. The Wu 2014 paper carefully derives the Hilbert-envelope adjoint chain rule (which requires FFT in the backprop step too) to fix this — beyond this widget's in-browser scope.

Production envelope FWI codes use Hilbert envelopes; production frequency-continuation codes use band-pass filtering; production AWI codes use Wiener filter convolution. All three converge from cycle-skipped starting models because their misfits + their adjoint chain rules together preserve the convex landscape all the way to the gradient.

The production workflow: misfit continuation

Real production FWI runs a CASCADE of misfits:

  • First, low-frequency envelope inversion (immune to cycle skipping; resolves only large-scale structure).
  • Then, low-frequency L² inversion (warm-started from envelope; recovers mid-scale structure).
  • Then, frequency continuation through the §6.4 stages (refining successively higher-frequency content).
  • Finally, full-band L² inversion at the highest frequencies (high-resolution detail).

Each stage starts from the previous stage's converged model. This is the standard recipe in CGG's Geovation, Schlumberger's WesternGeco, and BP's reservoir-FWI workflows. The §6.4 frequency-continuation widget builds curriculum on the data dimension; this section's envelope idea builds curriculum on the misfit-function dimension; production stacks both.

What about PINN-FWI?

The same misfit-replacement trick works for PINN-FWI. Replace the LdataL_{\mathrm{data}} term in the joint loss

L=λdataLdata+λpdeLpde+\mathcal{L} = \lambda_{\mathrm{data}} \mathcal{L}_{\mathrm{data}} + \lambda_{\mathrm{pde}} \mathcal{L}_{\mathrm{pde}} + \ldots

by an envelope or Wasserstein version. Auto-diff handles the gradient automatically. Sun & Alkhalifah (2021) and Yang et al. (2023 Geophysics) both report PINN-FWI with envelope or Wasserstein misfits successfully recovering velocity from cycle-skipped starting models that defeated L²-PINN-FWI. The §6.4 frequency-continuation idea also applies — just bandpass-filter the data input to the PINN-FWI joint loss.

What §6.6 will do

§6.6 frees a SECOND layer parameter (the top-layer velocity c1c_1) in addition to the middle-layer c2c_2, and shows the resulting 2D misfit landscape J(c1,c2)J(c_1, c_2). The cross-talk between layer velocities — the inability of single-shot transmission data to independently constrain both — appears as an elongated valley in the heatmap. The same valley structure shows up at much larger scale in production multi-parameter FWI (P-wave velocity vs S-wave velocity vs density); the §6.6 prose covers the full vpv_p/vsv_s/ρ\rho generalisation and its remedies (multi-shot illumination, joint inversion with auxiliary data, impedance reparameterisation).

References

  • Wu, R.-S., Luo, J., Wu, B. (2014). Seismic envelope inversion and modulation signal model. Geophysics 79(3), WA13–WA24.
  • Engquist, B., Froese, B.D. (2014). Application of the Wasserstein metric to seismic signals. Communications in Mathematical Sciences 12(5), 979–988.
  • Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J. (2016). Measuring the misfit between seismograms using an optimal transport distance: application to full waveform inversion. Geophys. J. Int. 205(1), 345–377.
  • Warner, M., Guasch, L. (2016). Adaptive waveform inversion: Theory. Geophysics 81(6), R429–R445.
  • Sun, B., Alkhalifah, T. (2021). The application of optimal transport for FWI in seismic exploration. Geophys. J. Int.
  • Yang, F., Ma, J., Lu, Y., Liu, X. (2023). Physics-informed deep learning for full-waveform inversion via Wasserstein loss. Geophysics.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.