Boundary and initial conditions: soft enforcement

Part 1 — The PINN formulation

Learning objectives

Distinguish hard and soft constraint enforcement; recognise that §1.2 and §1.3 use only soft enforcement
See in concrete shape that the relative weights of soft-enforced loss terms control which constraints the optimiser cares about
Match three failure modes to their loss-weight regimes: weak IC → trivial solution; weak PDE → IC-overfit; balanced → correct
Be primed for §1.5’s alternative (hard-constraint reparameterisation) and Part 3’s adaptive-weight schemes

Every PINN we have trained so far in Part 1 enforced its initial and boundary conditions softly — by adding $\lambda \cdot \textrm{(constraint violation)}^2$ as a loss term. That is the simplest possible enforcement strategy. It is also the source of one of the three central training pathologies that Part 3 dedicates a whole section to.

Hard vs soft enforcement

For a constraint $\mathcal{B}[u] = 0$ on the boundary, you have two design choices:

Soft enforcement: add $\lambda_b \cdot (1/N) \sum_i (\mathcal{B}u)^2$ to the loss. The optimiser tries to make this small along with the data and PDE terms. The constraint is satisfied only approximately, in the limit of training.
Hard enforcement: choose a network architecture that automatically satisfies the constraint by construction, so the constraint loss term is identically zero and disappears from the optimisation. The constraint is satisfied exactly, no matter what the network parameters are. We will see this in §1.5.

Soft enforcement is more flexible and more general. Hard enforcement is more reliable and faster to train when it applies. The choice depends on the constraint: simple Dirichlet conditions on a rectangular domain accept hard enforcement easily; complicated geometry, time-dependent boundaries, or implicit boundaries (free surfaces) often force soft enforcement.

The loss-balance crisis

With soft enforcement, the relative weights $\lambda_d, \lambda_b, \lambda_p$ are structural hyperparameters. Set them poorly and the optimiser focuses on whichever term is largest, dropping the others. Both extremes are common in practice:

If $\lambda_p \gg \lambda_b$ : the optimiser drops the boundary terms and converges to a solution that satisfies the PDE but not the boundary conditions. For most PDEs this means a wrong solution: many functions satisfy a homogeneous PDE locally; the BC is what selects the right one. The trivial solution $u = 0$ usually wins.
If $\lambda_b \gg \lambda_p$ : the optimiser drops the PDE term and converges to whatever shape interpolates the boundary data. Since almost any smooth function fits a sparse set of boundary points, the answer in the interior is meaningless.
The "right" balance depends on the problem. There is no single rule; it has to be tuned, and often the network capacity, the geometry, and the PDE all interact with the choice.

This is the loss-balance crisis, and it is the single biggest reason beginner PINNs fail to converge. Part 3 (§3.2–3.4) will introduce automatic and adaptive schemes that handle the balance for you — NTK-balanced weighting (Wang, Yu and Perdikaris 2022), gradient-norm balancing, self-adaptive weights, RBA, and so on. For now, the goal is to see the failure modes so you recognise them later.

Try it

The widget below targets a simple harmonic oscillator with $\omega = 2$ :

u''(t) + \omega^2 u(t) = 0,\quad u(0) = 1,\quad u'(0) = 0,\quad t \in [0, 4].

True solution: $u(t) = \cos(\omega t)$ . Two soft loss term groups, each independently weighted: $L_{\mathrm{IC}}$ (combined initial value + initial velocity) and $L_{\mathrm{PDE}}$ . Drag the $\lambda$ sliders or click the three preset buttons.

Three regimes to walk through

Click preset: λ_IC ≪ λ_PDE and Play. The IC weight is 0.01 vs PDE weight 1.0. Watch what happens. The relative L2 error (yellow) lands near 1, meaning the prediction is roughly orthogonal to the truth. Looking at the prediction: it has converged toward $u(t) \approx 0$ — the trivial solution. The IC term is too weak to drag the network away from it.
Click preset: balanced. Both weights are 1.0. Within ~3000 epochs the relative L2 error drops below 1%. The prediction sweeps onto the cosine curve and stays.
Click preset: λ_PDE ≪ λ_IC. Now the PDE weight is 0.01 and IC is 1.0. The IC anchor at $t = 0$ is enforced cleanly, but the late-time prediction is wildly wrong — the residual is too cheap to violate. The relative L2 stays high, often above 0.5.

The relative-L2 trace, the no-cheat metric

The yellow trace in the loss panel is the relative L2 error of $u_\theta(t)$ vs the true $\cos(\omega t)$ at the collocation points. Unlike the loss components, it cannot be gamed: it directly measures how close the network is to the target. A small training loss with a large relative L2 error means the network is solving the wrong optimisation problem (which is exactly what loss-imbalance causes). When all three components drop and stay aligned, training is healthy.

Why this matters for seismic PINNs

For a wave-equation PINN with three or four loss terms (data, IC, BC, multiple PDE residual components for vector wave physics), the loss-balance crisis is the dominant cause of training failure. Even before we get to seismic specifics in Part 4, recognising the failure mode is half the battle. The other half is the toolbox in Part 3 that automates the rebalancing. Soft enforcement remains the workhorse — but only with the rebalancing tools applied.

Pause-and-check. (1) Why does the relative L2 error give different information than the training loss? Construct an example where one is small and the other is large. (2) On the $\lambda_{\mathrm{IC}} \ll \lambda_{\mathrm{PDE}}$ preset, why does the network converge to $u = 0$ specifically and not, say, a phase-shifted cosine? (3) For the simple oscillator above, what would a hard-constrained architecture look like that satisfies $u(0) = 1$ and $u'(0) = 0$ by construction?

References

Raissi, M., Perdikaris, P., Karniadakis, G.E. (2019). Physics-informed neural networks. J. Comput. Phys. 378, 686–707.
Wang, S., Teng, Y., Perdikaris, P. (2021). Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 43(5), A3055–A3081.
Sun, L., Gao, H., Pan, S., Wang, J.-X. (2020). Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data. CMAME 361, 112732.
Cuomo, S., Di Cola, V.S., Giampaolo, F., et al. (2022). Scientific machine learning through physics-informed neural networks. J. Sci. Comput. 92(3), 88.