Fourier feature embeddings

Part 2 — Architectures for PINNs

Learning objectives

State the Fourier-feature recipe γ(x) = [sin(2πBx), cos(2πBx)] with B ~ N(0, σ²)
Recognise the σ Goldilocks zone: too small → still spectral-biased; matched → high-freq content recovered; too large → noise
Connect the recipe to the §0.9 spectral-bias result and the NTK-shaping interpretation behind it
Pick a sensible σ for a target whose frequency content is roughly known

The first architectural fix to the spectral-bias problem (§0.9) was published by Tancik, Srinivasan, Mildenhall and collaborators in 2020 and is, on paper, almost embarrassingly simple. Before passing the input $x$ to a vanilla MLP, lift it through a fixed random Fourier transformation:

\gamma(\mathbf{x}) \;=\; \big[\,\sin(2\pi B \mathbf{x}),\ \cos(2\pi B \mathbf{x})\,\big]

where $B \in \mathbb{R}^{m \times d}$ is a random matrix with entries drawn from $\mathcal{N}(0, \sigma^2)$ . The output of $\gamma$ is $2m$ -dimensional. Pass $\gamma(\mathbf{x})$ into a standard MLP exactly as you would a raw input. The matrix $B$ is sampled once at network initialisation and is not trained; only the MLP's weights are trained.

Why this works (the NTK story, briefly)

Spectral bias arises because a vanilla MLP's neural tangent kernel has a power-law eigenvalue spectrum that decays rapidly with eigenfrequency. Said in plain language: the optimisation gradient pulls toward low-frequency directions much faster than toward high-frequency ones, so the network learns the slow modes first and the fast modes never. The Fourier embedding modifies the kernel: the cosine and sine basis functions sample a band of frequencies determined by $\sigma$ , and the kernel becomes approximately translation-invariant within that band, with a much flatter eigenvalue spectrum. The optimisation gradient now pulls roughly equally on all in-band frequencies, and high-frequency content can be learned at the same rate as low-frequency content.

You can take that paragraph as cookbook fact for now; the proof is in Tancik et al 2020 §3. The practical consequence is the σ Goldilocks zone described below.

The σ Goldilocks zone

The Fourier embedding's only hyperparameter is $\sigma$ . Its effect is sharp:

σ too small: the $B$ entries are tiny, so $2\pi B \mathbf{x}$ is a small angle, and $\sin$ / $\cos$ of that small angle are essentially linear in their input. The network sees almost the raw $\mathbf{x}$ and behaves like a vanilla MLP — spectral bias survives. This is the failure mode the widget shows clearly.
σ matched (or higher): when the embedding spans the full spectral content of the target — anywhere in the matched-to-much-higher band — the MLP can compose the required Fourier coefficients. The Goldilocks upper bound is generous: in the widget, both σ = 8 and σ = 32 reach essentially perfect fits because the target's highest frequency (9 cycles) is well below σ = 32's effective bandwidth.
σ extremely large (orders of magnitude above the target frequency, e.g., σ ~ 1000 for a target with frequencies <10): in this regime, adjacent input points produce wildly different $\sin$ / $\cos$ patterns. The network sees something close to noise; the MLP cannot integrate signal from features that are uncorrelated with one another. The widget's defaults stay below this regime — Tancik et al's practical experience is that σ matched-to-2× the target frequency band is the sweet spot.

Try it

Four identical Fourier-feature MLPs train side by side, one for each σ in {0.5, 2, 8, 32}. The default target is a sum of cosines at frequencies 1, 4, 9 cycles per [-1, 1]. Press Play and watch what each panel resolves:

σ = 0.5: tracks only the slowest component (frequency 1). The middle and fast components are missing — this is the spectral-bias regime.
σ = 2: catches the medium component (frequency 4) too. The fast 9-cycle wiggle still missing.
σ = 8: matched to the highest frequency. Clean fit across the full spectrum.
σ = 32: also reaches a clean fit in this experiment, because the target frequencies fit comfortably below σ = 32's effective bandwidth. The "noisy" failure mode appears for σ orders of magnitude above the target — typically σ ≥ 1000 for a target with frequencies < 10.

How to pick σ in practice

If you know the highest spatial frequency in the target, set $\sigma$ to be of the same order. For a wavefield with maximum frequency $f_{\max}$ and domain length $L$ , set $\sigma \approx f_{\max} \cdot L$ . The window is broad — anything in 0.5× to 2× usually works. If you do not know $f_{\max}$ , scan $\sigma$ over a few orders of magnitude and pick the one with the lowest validation loss. For PINN problems where the target frequency content is implicit (e.g., a wavefield whose frequency depends on the velocity model), the multi-scale architectures of §2.5 sidestep the σ-tuning by combining several scales in one network.

Why this matters for seismic PINNs

An exploration-seismic wavefield at 30 Hz in a 4500 m/s medium has a spatial wavelength of 150 m. Across a 4 km target, that is ~27 cycles. To represent it, the Fourier embedding needs σ on the order of $\sim 27/4 \approx 7$ (cycles per km × domain in km × scale factor depending on input normalisation). The exact recipe varies by paper, but the requirement that $\sigma$ scale with the wavefield frequency is universal in modern PINN-FWI work (Bin Waheed et al 2021; Song et al 2021). The Helmholtz formulation we will meet in Part 4 is particularly Fourier-feature-friendly because the frequency is explicit.

Pause-and-check. (1) Switch to the "Chirp" target. Which σ wins? Why? (2) The σ = 0.5 panel produces a fit that looks essentially like a vanilla MLP's. Why is that the failure mode at small σ? (3) For the §1.3 Burgers PDE on $x \in [-1, 1]$ , what σ would you start with for the Fourier embedding, and why?

References

Tancik, M., Srinivasan, P.P., Mildenhall, B., et al. (2020). Fourier features let networks learn high-frequency functions in low-dimensional domains. NeurIPS.
Wang, S., Wang, H., Perdikaris, P. (2021). On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. CMAME 384, 113938.
Mildenhall, B., Srinivasan, P.P., Tancik, M., et al. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. ECCV.
Rahaman, N., Baratin, A., Arpit, D., et al. (2019). On the spectral bias of neural networks. ICML.