Causality weighting for time-domain PINNs
Learning objectives
- Diagnose causality violation: late-time residuals collapse before early-time residuals
- Implement Wang–Sankaran–Perdikaris (2024) causality-aware weighting
- Confirm that causal weighting forces the network to learn time-causally
- Apply the technique to 1D advection — a canonical seismic-wave proxy
Time-domain PINNs are notorious for cheating on time. Given a PDE on with IC , the optimiser can find a smooth function that satisfies the PDE near while violating the IC at . The total residual averaged over the domain is small, but the solution is wrong — the network has solved an easier problem (steady state at ) instead of the actual evolution problem.
This is the causality violation — pathology #4 from §3.1. Wang, Sankaran & Perdikaris (2024) named it and gave a clean fix.
The causality-aware weight
For a residual at fixed time , define
where is a hyperparameter and is the residual at time averaged over space. The intuition: is small when the integrated earlier-time residual is large, and rises to one as the earlier-time residual falls. Late-time collocation points are downweighted until earlier residuals are small.
The weight is recomputed each training step. Initially for small and for large ; the optimiser sees only the IC region. As the IC region fits, at later times rises and the optimiser advances forward in time. The network learns the evolution one time-slab at a time — the natural causality of the underlying physics.
Try it: causality on 1D advection
The widget races uniform-weighted training (§3.1 causality-violation pathology) against causality-weighted training on , , on , . The exact solution is .
What you should observe
- Uniform weights: late-time residual sits at or above the early-time residual — the causality-violation signature. The network is solving the problem out of time-order.
- Causal weights: both early and late residuals drop by an order of magnitude or more, and crucially the early bin drops first. The schedule (right panel) shows stays at 1 while starts near zero and rises only as the early-time fit cleans up.
- The relative-L² improvement on this 1D advection toy is modest because the architecture is small; the per-window residual story is what scales to 2D wave equations, where causality weighting goes from "nice" to "essential". This is the classical result Wang, Sankaran & Perdikaris (2024) demonstrated on Allen-Cahn and the wave equation.
- The causality-weight panel shows the schedule: stays at 1.0; starts low and rises only as training progresses.
For the wave equation
The advection problem above is the smallest non-trivial test case. On the 2D acoustic wave equation — the Part 4 setting — causality violation is severe. Without causality weighting the wavefront often forms simultaneously across the entire domain, producing checkerboard artefacts; with causality weighting the wavefront propagates outward from the source as it should physically. Wang, Sankaran & Perdikaris (2024) demonstrate this on the Allen-Cahn equation; subsequent papers (Diab et al. 2024) apply the same idea to acoustic FWI.
Choosing ε
The hyperparameter controls how aggressively later times are downweighted. Wang et al. (2024) recommend choosing such that at the end of the time domain is roughly at the start of training. In our advection demo achieves this. Too small and the weighting has no effect; too large and the optimiser cannot escape the region. The widget exposes the schedule so you can see this directly.
References
- Wang, S., Sankaran, S., Perdikaris, P. (2024). Respecting causality is all you need for training physics-informed neural networks. Comput. Methods Appl. Mech. Engrg.
- Mattey, R., Ghosh, S. (2022). A novel sequential method to train physics-informed neural networks. CMAME.