Multi-scale failure modes in pure forward PINN

Part 5 — Forward modelling and where PINNs fall short

Learning objectives

Sweep the spatial frequency k of the source/IC and watch PINN's error grow
Confirm that FDTD's accuracy is k-independent (as long as Δx resolves the wavelength)
Connect this to the §0.9, §2.2, §3.1 spectral-bias diagnosis
Recognise that architectural fixes (Fourier features, SIREN) help but don't close the gap

§5.3 showed FDTD beating PINN on a fixed problem. This section asks: how does the gap scale with the difficulty of the problem? The answer: it widens dramatically as the wavefield gets multi-scale.

The setup

Same 1D wave problem, but with eigenmode IC $u(x, 0) = \sin(k \pi x)$ for $k \in {1, 2, 3, 4, 6}$ . Analytic solution $u(x, t) = \sin(k \pi x) \cos(k \pi t)$ . Both temporal and spatial frequency scale linearly with $k$ .

FDTD at $N_x = 400$ (well above Nyquist for all $k \le 6$ , where Nyquist requires $N_x \geq 2 \cdot 2k = 4k$ ).
PINN at fixed 1500 epochs with vanilla 2-32-32-1 Tanh (no Fourier features).

Try it

What you should observe

FDTD relative-L² grows mildly with $k$ (numerical dispersion is $O(k^2 \Delta x^2)$ for 2nd-order central differences). At $N_x = 400$ this means a few $\times 10^{-5}$ at $k = 1$ rising to ~ $10^{-3}$ – $10^{-2}$ at $k = 6$ . Still well under 1%.
PINN relative-L² climbs much more steeply: ~10–20% at $k = 1$ , saturates near 100% (random-output level) for $k \ge 2$ . The vanilla MLP at 1500 epochs simply cannot fit the high-frequency target — spectral bias.
The PINN/FDTD ratio is dominated by the PINN failure: 100× to 1000+× across the range. The spectral-bias problem is the dominant scaling story; FDTD's $O(k^2 \Delta x^2)$ dispersion is a footnote by comparison.

The diagnosis

This is pathology #2 from §3.1 (spectral bias) reasserting itself in seismic forward-modelling clothes. The Tancik 2020 NTK theory predicts the convergence rate of mode $k$ scales like the inverse of the eigenvalue of the NTK at frequency $k$ , which decays exponentially. So vanilla MLPs simply cannot resolve high-frequency content within any practical training budget.

Architectural fixes from Part 2 partially help:

Fourier features (§2.2): γ(x) = [sin(2πBx), cos(2πBx)] with random $B \sim \mathcal{N}(0, \sigma^2)$ . Choose $\sigma$ to cover the $k$ band. Flattens the convergence curve substantially. Still slower than FDTD by ~100×.
SIREN (§2.3): $\sin(\omega_0 x)$ activations with carefully tuned $\omega_0$ . Similar effect.
Multi-scale Fourier (§2.5): combine multiple $\sigma$ values. Best for genuinely multi-scale targets.

None of these close the cost gap with FDTD for forward modelling. They open the door to inverse problems where the network represents the unknown velocity field, not the wavefield (Part 6).

References

Tancik, M., et al. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS.
Wang, S., Yu, X., Perdikaris, P. (2022). When and why PINNs fail to train: A neural tangent kernel perspective. JCP 449. The NTK analysis of spectral bias in PINN.