Hard-constraint enforcement via reparameterisation

Part 1 — The PINN formulation

Learning objectives

  • Build a network output that satisfies an initial or boundary condition by construction, not by penalty
  • See concretely that hard-constraint enforcement removes the loss-balance crisis from §1.4 entirely — the central practical win, not raw speed
  • Recognise the standard ansatz u_pred(x) = u_BC(x) + ϕ(x) · NN(x) and which boundary conditions it can absorb
  • Identify when soft enforcement is still required (complicated geometry, time-dependent boundaries) and when hard enforcement is the dominant choice

The previous section made the loss-balance crisis vivid. The simplest fix to the crisis is to remove the soft constraint terms entirely and bake the constraint into the network architecture. Then there is no balance to get wrong, and 100% of the optimisation effort goes toward the PDE residual. This is hard-constraint enforcement and it is one of the cleanest tricks in PINN engineering.

The reparameterisation idea

Instead of writing uθ(x)=NNθ(x)u_\theta(x) = \mathrm{NN}_\theta(x) and hoping the optimiser learns to satisfy the boundary conditions, write

uθ(x)  =  uBC(x)  +  ϕ(x)NNθ(x),u_\theta(x) \;=\; u_{\mathrm{BC}}(x) \;+\; \phi(x)\,\mathrm{NN}_\theta(x),

where uBCu_{\mathrm{BC}} is any (smooth) function that satisfies the boundary conditions, and ϕ\phi is a (smooth) function that vanishes on the boundary. Since ϕ(x)=0\phi(x) = 0 on Ω\partial\Omega, we have uθ(x)=uBC(x)u_\theta(x) = u_{\mathrm{BC}}(x) on Ω\partial\Omega no matter what the network parameters are. The boundary conditions are identities of the architecture, not goals to be optimised toward.

The widget's example

For the harmonic oscillator u+ω2u=0u'' + \omega^2 u = 0 with u(0)=1,u(0)=0u(0)=1, u'(0)=0, the simplest reparameterisation is

uhard(t)  =  1  +  t2NN(t).u_{\mathrm{hard}}(t) \;=\; 1 \;+\; t^2 \,\mathrm{NN}(t).

Substitute t=0t = 0: uhard(0)=1+0NN(0)=1u_{\mathrm{hard}}(0) = 1 + 0 \cdot \mathrm{NN}(0) = 1, exactly. Differentiate once: uhard(t)=2tNN(t)+t2NN(t)u_{\mathrm{hard}}'(t) = 2t,\mathrm{NN}(t) + t^2,\mathrm{NN}'(t), so uhard(0)=0u_{\mathrm{hard}}'(0) = 0, exactly. The network never has to learn the initial conditions; they are baked in. Only the PDE residual remains in the loss.

This requires the runtime to differentiate through the reparameterisation, because the residual R=uhard+ω2uhardR = u_{\mathrm{hard}}'' + \omega^2 u_{\mathrm{hard}} involves derivatives of the network multiplied by powers of tt. Working it out:

uhard(t)  =  2NN(t)  +  4tNN(t)  +  t2NN(t).u_{\mathrm{hard}}''(t) \;=\; 2\,\mathrm{NN}(t) \;+\; 4t\,\mathrm{NN}'(t) \;+\; t^2\,\mathrm{NN}''(t).

So R(t)=2NN(t)+4tNN(t)+t2NN(t)+ω2(1+t2NN(t))R(t) = 2,\mathrm{NN}(t) + 4t,\mathrm{NN}'(t) + t^2,\mathrm{NN}''(t) + \omega^2(1 + t^2,\mathrm{NN}(t)). The widget computes this at each collocation point using the runtime's forwardDerivs, then back-propagates the parameter gradient through backwardDerivs with the appropriate chain-rule coefficients.

Try it

Hard ConstraintInteractive figure — enable JavaScript to interact.

Two networks, identical architecture, identical initial weights, same Adam learning rate. The red curve is the soft-enforced network from §1.4. The green curve is the hard-enforced one. The widget defaults to λIC=0.1\lambda_{\mathrm{IC}} = 0.1 — the kind of poorly-tuned soft setup that breaks PINN training in real problems. Press Play and watch the relative-L2 error trace: the soft network struggles to satisfy the IC and its prediction wanders, while the hard network is unaffected and converges cleanly to within a few percent of the truth.

Now slide λIC\lambda_{\mathrm{IC}} all the way up to 1.01.0 and reinit. Both networks converge well. That is the substantive point: hard's win is not raw speed (with a well-balanced soft setup the soft network is competitive). Hard's win is robustness to the choice of λ_IC. The hard architecture removes a hyperparameter that would otherwise need careful tuning, and in production PINN code that is a substantial reliability gain. The harmonic-oscillator pathology you saw in §1.4 — trivial-solution collapse when λ_IC is too small — simply does not happen when the IC is an architectural identity.

When hard enforcement applies

Hard enforcement works whenever the boundary conditions can be expressed as u(x)=uBC(x)u(x) = u_{\mathrm{BC}}(x) on a manifold Ω\partial\Omega for which a smooth function ϕ\phi vanishing on it is easy to write down. Common cases:

  • 1D ODE on [0,T][0, T] with initial value: uhard(t)=u0+tNN(t)u_{\mathrm{hard}}(t) = u_0 + t \cdot \mathrm{NN}(t). Higher-order ICs need higher powers of tt.
  • 1D PDE on [a,b][a, b] with Dirichlet BCs: uhard(x)=uBC(x)+(xa)(bx)NN(x)u_{\mathrm{hard}}(x) = u_{\mathrm{BC}}(x) + (x - a)(b - x),\mathrm{NN}(x). The polynomial (xa)(bx)(x-a)(b-x) vanishes on the boundary.
  • 2D PDE on a rectangle: more complex but still constructible by combining low-order polynomials in each dimension.
  • Wave equations with periodic BCs: replace the network input xx with sin(2πx/L),cos(2πx/L)\sin(2\pi x / L), \cos(2\pi x / L) to make periodicity an architectural identity — the trick used in many physics-informed Fourier-feature networks.

When hard enforcement does not apply

  • Complicated geometry (curved boundaries, holes, internal interfaces). Constructing ϕ\phi that vanishes exactly on a complicated boundary is often as hard as solving the PDE itself.
  • Time-dependent or implicit boundaries (free surfaces, moving fronts). The geometry of Ω\partial\Omega changes during the solution; the architecture would have to change too.
  • Soft data constraints (sparse noisy observations away from the boundary) cannot be hard-enforced because they are noisy — the constraint is itself a soft fit.

For these cases soft enforcement is the workhorse, and Part 3 introduces the adaptive-weight tools that make it reliable. But whenever the geometry permits, hard enforcement is the cleaner choice. Many production seismic PINNs use a hybrid: hard for the IC and the simple BCs, soft for the data and complex BCs.

Why this matters for seismic PINNs

Seismic forward models on rectangular cubes admit elegant hard-constraint reparameterisations for the initial wavefield (typically zero) and the surface boundary conditions. Modern PINN-FWI papers exploit this routinely. Inverse problems with sparse seismic data add a soft data term on top. So you usually end up with a hybrid network whose architecture enforces the easy constraints and whose loss enforces the rest.

Pause-and-check. (1) For the 1D heat equation ut=αuxxu_t = \alpha u_{xx} on [0,1][0, 1] with u(0,t)=0u(0, t) = 0, u(1,t)=0u(1, t) = 0, u(x,0)=sin(πx)u(x, 0) = \sin(\pi x): write a hard-constraint reparameterisation that satisfies all three conditions exactly. (2) For the harmonic-oscillator setup above, what is the smallest value of λIC\lambda_{\mathrm{IC}} at which the soft network catches up to the hard one? Why does the hard network not care about that threshold? (3) Can hard-constraint enforcement be combined with hard-constraint BCs on the same network? What does that look like?

References

  • Lagaris, I.E., Likas, A., Fotiadis, D.I. (1998). Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Networks 9(5), 987–1000.
  • Sukumar, N., Srivastava, A. (2022). Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. CMAME 389, 114333.
  • Berg, J., Nyström, K. (2018). A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, 28–41.
  • Sun, L., Gao, H., Pan, S., Wang, J.-X. (2020). Surrogate modeling for fluid flows based on physics-constrained deep learning. CMAME 361, 112732.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.