Fourier feature embeddings
Learning objectives
- State the Fourier-feature recipe γ(x) = [sin(2πBx), cos(2πBx)] with B ~ N(0, σ²)
- Recognise the σ Goldilocks zone: too small → still spectral-biased; matched → high-freq content recovered; too large → noise
- Connect the recipe to the §0.9 spectral-bias result and the NTK-shaping interpretation behind it
- Pick a sensible σ for a target whose frequency content is roughly known
The first architectural fix to the spectral-bias problem (§0.9) was published by Tancik, Srinivasan, Mildenhall and collaborators in 2020 and is, on paper, almost embarrassingly simple. Before passing the input to a vanilla MLP, lift it through a fixed random Fourier transformation:
where is a random matrix with entries drawn from . The output of is -dimensional. Pass into a standard MLP exactly as you would a raw input. The matrix is sampled once at network initialisation and is not trained; only the MLP's weights are trained.
Why this works (the NTK story, briefly)
Spectral bias arises because a vanilla MLP's neural tangent kernel has a power-law eigenvalue spectrum that decays rapidly with eigenfrequency. Said in plain language: the optimisation gradient pulls toward low-frequency directions much faster than toward high-frequency ones, so the network learns the slow modes first and the fast modes never. The Fourier embedding modifies the kernel: the cosine and sine basis functions sample a band of frequencies determined by , and the kernel becomes approximately translation-invariant within that band, with a much flatter eigenvalue spectrum. The optimisation gradient now pulls roughly equally on all in-band frequencies, and high-frequency content can be learned at the same rate as low-frequency content.
You can take that paragraph as cookbook fact for now; the proof is in Tancik et al 2020 §3. The practical consequence is the σ Goldilocks zone described below.
The σ Goldilocks zone
The Fourier embedding's only hyperparameter is . Its effect is sharp:
- σ too small: the entries are tiny, so is a small angle, and / of that small angle are essentially linear in their input. The network sees almost the raw and behaves like a vanilla MLP — spectral bias survives. This is the failure mode the widget shows clearly.
- σ matched (or higher): when the embedding spans the full spectral content of the target — anywhere in the matched-to-much-higher band — the MLP can compose the required Fourier coefficients. The Goldilocks upper bound is generous: in the widget, both σ = 8 and σ = 32 reach essentially perfect fits because the target's highest frequency (9 cycles) is well below σ = 32's effective bandwidth.
- σ extremely large (orders of magnitude above the target frequency, e.g., σ ~ 1000 for a target with frequencies <10): in this regime, adjacent input points produce wildly different / patterns. The network sees something close to noise; the MLP cannot integrate signal from features that are uncorrelated with one another. The widget's defaults stay below this regime — Tancik et al's practical experience is that σ matched-to-2× the target frequency band is the sweet spot.
Try it
Four identical Fourier-feature MLPs train side by side, one for each σ in {0.5, 2, 8, 32}. The default target is a sum of cosines at frequencies 1, 4, 9 cycles per [-1, 1]. Press Play and watch what each panel resolves:
- σ = 0.5: tracks only the slowest component (frequency 1). The middle and fast components are missing — this is the spectral-bias regime.
- σ = 2: catches the medium component (frequency 4) too. The fast 9-cycle wiggle still missing.
- σ = 8: matched to the highest frequency. Clean fit across the full spectrum.
- σ = 32: also reaches a clean fit in this experiment, because the target frequencies fit comfortably below σ = 32's effective bandwidth. The "noisy" failure mode appears for σ orders of magnitude above the target — typically σ ≥ 1000 for a target with frequencies < 10.
How to pick σ in practice
If you know the highest spatial frequency in the target, set to be of the same order. For a wavefield with maximum frequency and domain length , set . The window is broad — anything in 0.5× to 2× usually works. If you do not know , scan over a few orders of magnitude and pick the one with the lowest validation loss. For PINN problems where the target frequency content is implicit (e.g., a wavefield whose frequency depends on the velocity model), the multi-scale architectures of §2.5 sidestep the σ-tuning by combining several scales in one network.
Why this matters for seismic PINNs
An exploration-seismic wavefield at 30 Hz in a 4500 m/s medium has a spatial wavelength of 150 m. Across a 4 km target, that is ~27 cycles. To represent it, the Fourier embedding needs σ on the order of (cycles per km × domain in km × scale factor depending on input normalisation). The exact recipe varies by paper, but the requirement that scale with the wavefield frequency is universal in modern PINN-FWI work (Bin Waheed et al 2021; Song et al 2021). The Helmholtz formulation we will meet in Part 4 is particularly Fourier-feature-friendly because the frequency is explicit.
Pause-and-check. (1) Switch to the "Chirp" target. Which σ wins? Why? (2) The σ = 0.5 panel produces a fit that looks essentially like a vanilla MLP's. Why is that the failure mode at small σ? (3) For the §1.3 Burgers PDE on , what σ would you start with for the Fourier embedding, and why?
References
- Tancik, M., Srinivasan, P.P., Mildenhall, B., et al. (2020). Fourier features let networks learn high-frequency functions in low-dimensional domains. NeurIPS.
- Wang, S., Wang, H., Perdikaris, P. (2021). On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks. CMAME 384, 113938.
- Mildenhall, B., Srinivasan, P.P., Tancik, M., et al. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. ECCV.
- Rahaman, N., Baratin, A., Arpit, D., et al. (2019). On the spectral bias of neural networks. ICML.