Fourier feature embeddings

Part 2 — Architectures for PINNs

Learning objectives

  • State the Fourier-feature recipe γ(x) = [sin(2πBx), cos(2πBx)] with B ~ N(0, σ²)
  • Recognise the σ Goldilocks zone: too small → still spectral-biased; matched → high-freq content recovered; too large → noise
  • Connect the recipe to the §0.9 spectral-bias result and the NTK-shaping interpretation behind it
  • Pick a sensible σ for a target whose frequency content is roughly known

The first architectural fix to the spectral-bias problem (§0.9) was published by Tancik, Srinivasan, Mildenhall and collaborators in 2020 and is, on paper, almost embarrassingly simple. Before passing the input xx to a vanilla MLP, lift it through a fixed random Fourier transformation:

γ(x)  =  [sin(2πBx), cos(2πBx)]\gamma(\mathbf{x}) \;=\; \big[\,\sin(2\pi B \mathbf{x}),\ \cos(2\pi B \mathbf{x})\,\big]

where BRm×dB \in \mathbb{R}^{m \times d} is a random matrix with entries drawn from N(0,σ2)\mathcal{N}(0, \sigma^2). The output of γ\gamma is 2m2m-dimensional. Pass γ(x)\gamma(\mathbf{x}) into a standard MLP exactly as you would a raw input. The matrix BB is sampled once at network initialisation and is not trained; only the MLP's weights are trained.

Why this works (the NTK story, briefly)

Spectral bias arises because a vanilla MLP's neural tangent kernel has a power-law eigenvalue spectrum that decays rapidly with eigenfrequency. Said in plain language: the optimisation gradient pulls toward low-frequency directions much faster than toward high-frequency ones, so the network learns the slow modes first and the fast modes never. The Fourier embedding modifies the kernel: the cosine and sine basis functions sample a band of frequencies determined by σ\sigma, and the kernel becomes approximately translation-invariant within that band, with a much flatter eigenvalue spectrum. The optimisation gradient now pulls roughly equally on all in-band frequencies, and high-frequency content can be learned at the same rate as low-frequency content.

You can take that paragraph as cookbook fact for now; the proof is in Tancik et al 2020 §3. The practical consequence is the σ Goldilocks zone described below.

The σ Goldilocks zone

The Fourier embedding's only hyperparameter is σ\sigma. Its effect is sharp:

  • σ too small: the BB entries are tiny, so 2πBx2\pi B \mathbf{x} is a small angle, and sin\sin/cos\cos of that small angle are essentially linear in their input. The network sees almost the raw x\mathbf{x} and behaves like a vanilla MLP — spectral bias survives. This is the failure mode the widget shows clearly.
  • σ matched (or higher): when the embedding spans the full spectral content of the target — anywhere in the matched-to-much-higher band — the MLP can compose the required Fourier coefficients. The Goldilocks upper bound is generous: in the widget, both σ = 8 and σ = 32 reach essentially perfect fits because the target's highest frequency (9 cycles) is well below σ = 32's effective bandwidth.
  • σ extremely large (orders of magnitude above the target frequency, e.g., σ ~ 1000 for a target with frequencies <10): in this regime, adjacent input points produce wildly different sin\sin/cos\cos patterns. The network sees something close to noise; the MLP cannot integrate signal from features that are uncorrelated with one another. The widget's defaults stay below this regime — Tancik et al's practical experience is that σ matched-to-2× the target frequency band is the sweet spot.

Try it

Fourier ScaleInteractive figure — enable JavaScript to interact.

Four identical Fourier-feature MLPs train side by side, one for each σ in {0.5, 2, 8, 32}. The default target is a sum of cosines at frequencies 1, 4, 9 cycles per [-1, 1]. Press Play and watch what each panel resolves:

  • σ = 0.5: tracks only the slowest component (frequency 1). The middle and fast components are missing — this is the spectral-bias regime.
  • σ = 2: catches the medium component (frequency 4) too. The fast 9-cycle wiggle still missing.
  • σ = 8: matched to the highest frequency. Clean fit across the full spectrum.
  • σ = 32: also reaches a clean fit in this experiment, because the target frequencies fit comfortably below σ = 32's effective bandwidth. The "noisy" failure mode appears for σ orders of magnitude above the target — typically σ ≥ 1000 for a target with frequencies < 10.

How to pick σ in practice

If you know the highest spatial frequency in the target, set σ\sigma to be of the same order. For a wavefield with maximum frequency fmaxf_{\max} and domain length LL, set σfmaxL\sigma \approx f_{\max} \cdot L. The window is broad — anything in 0.5× to 2× usually works. If you do not know fmaxf_{\max}, scan σ\sigma over a few orders of magnitude and pick the one with the lowest validation loss. For PINN problems where the target frequency content is implicit (e.g., a wavefield whose frequency depends on the velocity model), the multi-scale architectures of §2.5 sidestep the σ-tuning by combining several scales in one network.

Why this matters for seismic PINNs

An exploration-seismic wavefield at 30 Hz in a 4500 m/s medium has a spatial wavelength of 150 m. Across a 4 km target, that is ~27 cycles. To represent it, the Fourier embedding needs σ on the order of 27/47\sim 27/4 \approx 7 (cycles per km × domain in km × scale factor depending on input normalisation). The exact recipe varies by paper, but the requirement that σ\sigma scale with the wavefield frequency is universal in modern PINN-FWI work (Bin Waheed et al 2021; Song et al 2021). The Helmholtz formulation we will meet in Part 4 is particularly Fourier-feature-friendly because the frequency is explicit.

Pause-and-check. (1) Switch to the "Chirp" target. Which σ wins? Why? (2) The σ = 0.5 panel produces a fit that looks essentially like a vanilla MLP's. Why is that the failure mode at small σ? (3) For the §1.3 Burgers PDE on x[1,1]x \in [-1, 1], what σ would you start with for the Fourier embedding, and why?

References

  • Tancik, M., Srinivasan, P.P., Mildenhall, B., et al. (2020). Fourier features let networks learn high-frequency functions in low-dimensional domains. NeurIPS.
  • Wang, S., Wang, H., Perdikaris, P. (2021). On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. CMAME 384, 113938.
  • Mildenhall, B., Srinivasan, P.P., Tancik, M., et al. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. ECCV.
  • Rahaman, N., Baratin, A., Arpit, D., et al. (2019). On the spectral bias of neural networks. ICML.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.