Causality weighting for time-domain PINNs

Part 3 — Training pathologies and remedies

Learning objectives

  • Diagnose causality violation: late-time residuals collapse before early-time residuals
  • Implement Wang–Sankaran–Perdikaris (2024) causality-aware weighting
  • Confirm that causal weighting forces the network to learn time-causally
  • Apply the technique to 1D advection — a canonical seismic-wave proxy

Time-domain PINNs are notorious for cheating on time. Given a PDE ut+N[u]=0u_t + \mathcal{N}[u] = 0 on (x,t)Ω×[0,T](x, t) \in \Omega \times [0, T] with IC u(x,0)=u0(x)u(x, 0) = u_0(x), the optimiser can find a smooth function that satisfies the PDE near t=Tt = T while violating the IC at t=0t = 0. The total residual averaged over the (x,t)(x, t) domain is small, but the solution is wrong — the network has solved an easier problem (steady state at t=Tt = T) instead of the actual evolution problem.

This is the causality violation — pathology #4 from §3.1. Wang, Sankaran & Perdikaris (2024) named it and gave a clean fix.

The causality-aware weight

For a residual Lr(θ;t)=1Nxir(xi,t;θ)2\mathcal{L}_r(\theta; t) = \frac{1}{N_x} \sum_i r(x_i, t; \theta)^2 at fixed time tt, define

w(t)=exp(ε0tLr(s)ds),w(t) = \exp\left( -\varepsilon \int_0^t \overline{\mathcal{L}_r}(s) \, ds \right) ,

where ε\varepsilon is a hyperparameter and Lr(s)\overline{\mathcal{L}_r}(s) is the residual at time ss averaged over space. The intuition: w(t)w(t) is small when the integrated earlier-time residual 0tLr\int_0^t \overline{\mathcal{L}_r} is large, and rises to one as the earlier-time residual falls. Late-time collocation points are downweighted until earlier residuals are small.

The weight is recomputed each training step. Initially w(t)1w(t) \approx 1 for small tt and w(t)0w(t) \to 0 for large tt; the optimiser sees only the IC region. As the IC region fits, w(t)w(t) at later times rises and the optimiser advances forward in time. The network learns the evolution one time-slab at a time — the natural causality of the underlying physics.

Try it: causality on 1D advection

The widget races uniform-weighted training (§3.1 causality-violation pathology) against causality-weighted training on ut+ux=0u_t + u_x = 0, u(x,0)=sin(πx)u(x, 0) = \sin(\pi x), on x[1,1]x \in [-1, 1], t[0,1]t \in [0, 1]. The exact solution is u(x,t)=sin(π(xt))u(x, t) = \sin(\pi (x - t)).

CausalityInteractive figure — enable JavaScript to interact.

What you should observe

  • Uniform weights: late-time residual sits at or above the early-time residual — the causality-violation signature. The network is solving the problem out of time-order.
  • Causal weights: both early and late residuals drop by an order of magnitude or more, and crucially the early bin drops first. The schedule (right panel) shows w(t=0)w(t = 0) stays at 1 while w(t=1)w(t = 1) starts near zero and rises only as the early-time fit cleans up.
  • The relative-L² improvement on this 1D advection toy is modest because the architecture is small; the per-window residual story is what scales to 2D wave equations, where causality weighting goes from "nice" to "essential". This is the classical result Wang, Sankaran & Perdikaris (2024) demonstrated on Allen-Cahn and the wave equation.
  • The causality-weight panel shows the schedule: w(t=0)w(t = 0) stays at 1.0; w(t=1)w(t = 1) starts low and rises only as training progresses.

For the wave equation

The advection problem above is the smallest non-trivial test case. On the 2D acoustic wave equation utt=c22uu_{tt} = c^2 \nabla^2 u — the Part 4 setting — causality violation is severe. Without causality weighting the wavefront often forms simultaneously across the entire domain, producing checkerboard artefacts; with causality weighting the wavefront propagates outward from the source as it should physically. Wang, Sankaran & Perdikaris (2024) demonstrate this on the Allen-Cahn equation; subsequent papers (Diab et al. 2024) apply the same idea to acoustic FWI.

Choosing ε

The hyperparameter ε\varepsilon controls how aggressively later times are downweighted. Wang et al. (2024) recommend choosing ε\varepsilon such that w(T)w(T) at the end of the time domain is roughly 10210^{-2} at the start of training. In our advection demo ε=100\varepsilon = 100 achieves this. Too small and the weighting has no effect; too large and the optimiser cannot escape the t=0t = 0 region. The widget exposes the schedule so you can see this directly.

References

  • Wang, S., Sankaran, S., Perdikaris, P. (2024). Respecting causality is all you need for training physics-informed neural networks. Comput. Methods Appl. Mech. Engrg.
  • Mattey, R., Ghosh, S. (2022). A novel sequential method to train physics-informed neural networks. CMAME.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.