Hard-constraint enforcement via reparameterisation

Part 1 — The PINN formulation

Learning objectives

Build a network output that satisfies an initial or boundary condition by construction, not by penalty
See concretely that hard-constraint enforcement removes the loss-balance crisis from §1.4 entirely — the central practical win, not raw speed
Recognise the standard ansatz u_pred(x) = u_BC(x) + ϕ(x) · NN(x) and which boundary conditions it can absorb
Identify when soft enforcement is still required (complicated geometry, time-dependent boundaries) and when hard enforcement is the dominant choice

The previous section made the loss-balance crisis vivid. The simplest fix to the crisis is to remove the soft constraint terms entirely and bake the constraint into the network architecture. Then there is no balance to get wrong, and 100% of the optimisation effort goes toward the PDE residual. This is hard-constraint enforcement and it is one of the cleanest tricks in PINN engineering.

The reparameterisation idea

Instead of writing $u_\theta(x) = \mathrm{NN}_\theta(x)$ and hoping the optimiser learns to satisfy the boundary conditions, write

u_\theta(x) \;=\; u_{\mathrm{BC}}(x) \;+\; \phi(x)\,\mathrm{NN}_\theta(x),

where $u_{\mathrm{BC}}$ is any (smooth) function that satisfies the boundary conditions, and $\phi$ is a (smooth) function that vanishes on the boundary. Since $\phi(x) = 0$ on $\partial\Omega$ , we have $u_\theta(x) = u_{\mathrm{BC}}(x)$ on $\partial\Omega$ no matter what the network parameters are. The boundary conditions are identities of the architecture, not goals to be optimised toward.

For the harmonic oscillator $u'' + \omega^2 u = 0$ with $u(0)=1, u'(0)=0$ , the simplest reparameterisation is

u_{\mathrm{hard}}(t) \;=\; 1 \;+\; t^2 \,\mathrm{NN}(t).

Substitute $t = 0$ : $u_{\mathrm{hard}}(0) = 1 + 0 \cdot \mathrm{NN}(0) = 1$ , exactly. Differentiate once: $u_{\mathrm{hard}}'(t) = 2t,\mathrm{NN}(t) + t^2,\mathrm{NN}'(t)$ , so $u_{\mathrm{hard}}'(0) = 0$ , exactly. The network never has to learn the initial conditions; they are baked in. Only the PDE residual remains in the loss.

This requires the runtime to differentiate through the reparameterisation, because the residual $R = u_{\mathrm{hard}}'' + \omega^2 u_{\mathrm{hard}}$ involves derivatives of the network multiplied by powers of $t$ . Working it out:

u_{\mathrm{hard}}''(t) \;=\; 2\,\mathrm{NN}(t) \;+\; 4t\,\mathrm{NN}'(t) \;+\; t^2\,\mathrm{NN}''(t).

So $R(t) = 2,\mathrm{NN}(t) + 4t,\mathrm{NN}'(t) + t^2,\mathrm{NN}''(t) + \omega^2(1 + t^2,\mathrm{NN}(t))$ . The widget computes this at each collocation point using the runtime's forwardDerivs, then back-propagates the parameter gradient through backwardDerivs with the appropriate chain-rule coefficients.

Try it

Two networks, identical architecture, identical initial weights, same Adam learning rate. The red curve is the soft-enforced network from §1.4. The green curve is the hard-enforced one. The widget defaults to $\lambda_{\mathrm{IC}} = 0.1$ — the kind of poorly-tuned soft setup that breaks PINN training in real problems. Press Play and watch the relative-L2 error trace: the soft network struggles to satisfy the IC and its prediction wanders, while the hard network is unaffected and converges cleanly to within a few percent of the truth.

Now slide $\lambda_{\mathrm{IC}}$ all the way up to $1.0$ and reinit. Both networks converge well. That is the substantive point: hard's win is not raw speed (with a well-balanced soft setup the soft network is competitive). Hard's win is robustness to the choice of λ_IC. The hard architecture removes a hyperparameter that would otherwise need careful tuning, and in production PINN code that is a substantial reliability gain. The harmonic-oscillator pathology you saw in §1.4 — trivial-solution collapse when λ_IC is too small — simply does not happen when the IC is an architectural identity.

When hard enforcement applies

Hard enforcement works whenever the boundary conditions can be expressed as $u(x) = u_{\mathrm{BC}}(x)$ on a manifold $\partial\Omega$ for which a smooth function $\phi$ vanishing on it is easy to write down. Common cases:

1D ODE on $[0, T]$ with initial value: $u_{\mathrm{hard}}(t) = u_0 + t \cdot \mathrm{NN}(t)$ . Higher-order ICs need higher powers of $t$ .
1D PDE on $[a, b]$ with Dirichlet BCs: $u_{\mathrm{hard}}(x) = u_{\mathrm{BC}}(x) + (x - a)(b - x),\mathrm{NN}(x)$ . The polynomial $(x-a)(b-x)$ vanishes on the boundary.
2D PDE on a rectangle: more complex but still constructible by combining low-order polynomials in each dimension.
Wave equations with periodic BCs: replace the network input $x$ with $\sin(2\pi x / L), \cos(2\pi x / L)$ to make periodicity an architectural identity — the trick used in many physics-informed Fourier-feature networks.

When hard enforcement does not apply

Complicated geometry (curved boundaries, holes, internal interfaces). Constructing $\phi$ that vanishes exactly on a complicated boundary is often as hard as solving the PDE itself.
Time-dependent or implicit boundaries (free surfaces, moving fronts). The geometry of $\partial\Omega$ changes during the solution; the architecture would have to change too.
Soft data constraints (sparse noisy observations away from the boundary) cannot be hard-enforced because they are noisy — the constraint is itself a soft fit.

For these cases soft enforcement is the workhorse, and Part 3 introduces the adaptive-weight tools that make it reliable. But whenever the geometry permits, hard enforcement is the cleaner choice. Many production seismic PINNs use a hybrid: hard for the IC and the simple BCs, soft for the data and complex BCs.

Why this matters for seismic PINNs

Seismic forward models on rectangular cubes admit elegant hard-constraint reparameterisations for the initial wavefield (typically zero) and the surface boundary conditions. Modern PINN-FWI papers exploit this routinely. Inverse problems with sparse seismic data add a soft data term on top. So you usually end up with a hybrid network whose architecture enforces the easy constraints and whose loss enforces the rest.

Pause-and-check. (1) For the 1D heat equation $u_t = \alpha u_{xx}$ on $[0, 1]$ with $u(0, t) = 0$ , $u(1, t) = 0$ , $u(x, 0) = \sin(\pi x)$ : write a hard-constraint reparameterisation that satisfies all three conditions exactly. (2) For the harmonic-oscillator setup above, what is the smallest value of $\lambda_{\mathrm{IC}}$ at which the soft network catches up to the hard one? Why does the hard network not care about that threshold? (3) Can hard-constraint enforcement be combined with hard-constraint BCs on the same network? What does that look like?

References

Lagaris, I.E., Likas, A., Fotiadis, D.I. (1998). Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Networks 9(5), 987–1000.
Sukumar, N., Srivastava, A. (2022). Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. CMAME 389, 114333.
Berg, J., Nyström, K. (2018). A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, 28–41.
Sun, L., Gao, H., Pan, S., Wang, J.-X. (2020). Surrogate modeling for fluid flows based on physics-constrained deep learning. CMAME 361, 112732.