Spectral bias and why physics needs it fixed

Neural networks from absolute zero

Learning objectives

See spectral bias in concrete shape: a vanilla MLP fits low frequencies first, high frequencies last (or never)
Distinguish inductive bias from optimisation pathology
Recognise the two architectural fixes — Fourier-feature embeddings and SIREN — and what each buys
Understand why this matters more for PINNs than for ordinary regression

You now have everything you need to train a network. There remains one major surprise: a vanilla MLP — no matter how wide, no matter how well-optimised — has a strong inductive preference for low-frequency functions. It learns the low-frequency content of any target first, often quickly, and then plateaus on the high-frequency content. Sometimes it never learns the high frequencies at all, even with thousands of training epochs.

This phenomenon is called spectral bias. It was identified rigorously around 2018–2020 (Rahaman et al 2019; Basri et al 2019; Tancik et al 2020) and has a clean theoretical explanation in terms of the neural tangent kernel: a vanilla MLP's NTK has a power-law eigenvalue spectrum that suppresses high-frequency learning. The practical consequence: physics-informed networks fail at high-frequency wavefields unless we do something about it.

Two architectural fixes

Fourier feature embeddings (Tancik et al 2020). Before passing the input $x$ to the MLP, lift it to the high-dimensional vector $\gamma(x) = (\sin(2\pi B x), \cos(2\pi B x))$ where $B$ is a fixed (not trained) random matrix sampled from $\mathcal{N}(0, \sigma^2)$ . The MLP now has high-frequency primitives to compose; spectral bias is largely defeated. The hyperparameter $\sigma$ controls how high-frequency the embedding is.
SIREN (Sitzmann, Martel, Bergman, Lindell, Wetzstein 2020). Replace all activations with $\sin$ , scale the first layer's pre-activation by $\omega_0$ (typically 30), and use the SIREN initialisation. The result is a network whose every layer is natively oscillatory. SIREN is exceptionally good for high-frequency targets but requires care during initialisation — vanilla initialisation produces dead training.

Try it

Three identical-capacity networks (1 → 64 → 64 → 1) train in parallel above. The vanilla MLP, the Fourier-feature MLP, and the SIREN. Same target, same Adam optimiser, same learning rate, same data. Press Play and watch the loss curves. The vanilla curve typically descends fast at first then flattens; the Fourier-feature and SIREN curves descend further and reach the high-frequency content of the target. Switch to "Quad-frequency mix" for the most dramatic separation.

Three things to notice

Vanilla loss-curve plateau. The vanilla MLP's loss curve has a knee: a fast initial drop while it learns the low frequencies, then a much slower (or stalled) descent. The fitted curve in the vanilla panel will track the low-frequency envelope of the target but fail to produce the high-frequency wiggles. This is spectral bias made visible.
Fourier-feature loss continues. The Fourier-feature loss curve does not knee — it keeps falling because the network is using its high-frequency primitives. The fit panel shows the high-frequency wiggles emerging.
SIREN can outperform both, especially on the highest-frequency target, but is more sensitive to learning rate than the other two. If lr is too high it diverges; too low it never gets going. Try lr around $5\times 10^{-3}$ for the cleanest run.

Why this matters for PINNs

A wavefield in seismic exploration is fundamentally high-frequency: a 30 Hz wavefield cycling through a 4500 m/s medium has a spatial wavelength of $4500/30 = 150$ m. Across a 4 km exploration target, that is 27 cycles. Asking a vanilla MLP to fit such a wavefield is asking it to do exactly the thing spectral bias prevents. Without a Fourier-feature embedding or a SIREN, the PINN will learn a low-frequency blur and call it a wavefield.

This is one of the largest architectural choices in any PINN-for-seismology paper. Song-Alkhalifah-Waheed 2021 and Bin Waheed et al 2021 (frequency-domain PINNs for the acoustic Helmholtz equation) lean heavily on Fourier features. Smith-Azizzadenesheli-Ross 2021 (EikoNet for travel-time fields) get away with vanilla MLPs because travel-time fields are smooth. The right choice is problem-specific, but the wrong choice is a low-frequency blur every single time.

Beyond Fourier and SIREN

Modern alternatives include hash-grid encodings (Instant-NGP, Müller et al 2022), positional encodings borrowed from NeRF, and learnable Fourier features. Any encoding that embeds the input into a richer feature space attenuates spectral bias to some degree. We will return to all of these in Part 2 when we systematically survey PINN architectures.

Pause-and-check. (1) Why does the vanilla MLP's loss curve flatten while the Fourier-feature one does not? (2) For a smooth target like $\sin(\pi x)$ alone (no high frequencies), do you expect spectral bias to matter? (3) Can you imagine a target where SIREN would lose to a vanilla MLP? What kind of structure would the target need to have?

References

Rahaman, N., Baratin, A., Arpit, D., et al. (2019). On the spectral bias of neural networks. ICML.
Tancik, M., Srinivasan, P.P., Mildenhall, B., et al. (2020). Fourier features let networks learn high-frequency functions in low-dimensional domains. NeurIPS.
Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G. (2020). Implicit neural representations with periodic activation functions (SIREN). NeurIPS.
Wang, S., Wang, H., Perdikaris, P. (2021). On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. CMAME 384, 113938.