Bayesian PINNs and ensemble PINNs

Part 9 — Hybrid PINN + classical, with uncertainty

Learning objectives

Recognise the deep-ensembles recipe as the most practical Bayesian-deep-learning approximation
Train an N=8 ensemble of PINN replicas with different random initial weights
Identify regions where ensemble disagreement signals high uncertainty
Distinguish 'epistemic' uncertainty (model disagrees with itself) from 'aleatoric' (data noise)
Survey the alternatives: BBB, MC dropout, HMC, SVGD

Up to §9.4 we treated the PINN as a deterministic function approximator: train it, get a single prediction. Production seismic-imaging workflows need MORE than that — they need an UNCERTAINTY MAP that flags where the prediction is reliable and where it is not. §9.5 introduces the machinery: Bayesian PINNs and ensemble PINNs.

The two flavours of uncertainty

Epistemic uncertainty — the model is unsure because it has not seen enough data. Reduces with more data. Quantified by ensemble disagreement / Bayesian posterior width.
Aleatoric uncertainty — irreducible noise in the data itself. Does NOT reduce with more data. Quantified by output-noise variance.

For seismic FWI: aleatoric is the random noise on travel-time picks; epistemic is the uncertainty in the velocity model where data is sparse. The goal of §9.5-§9.6 is to produce maps for BOTH.

Deep ensembles — the simplest Bayesian-DL approximation

Lakshminarayanan, Pritzel, Blundell (2017) introduced "deep ensembles": train $N$ neural network replicas with different random initial weights on the same data and loss; combine their predictions. Mean is the central estimate; std is the uncertainty:

\hat{u}_{\mathrm{mean}}(x) = \frac{1}{N} \sum_{i=1}^{N} u_i(x; \theta_i), \qquad \hat{u}_{\mathrm{std}}(x) = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \bigl(u_i(x; \theta_i) - \hat{u}_{\mathrm{mean}}(x)\bigr)^2} .

Theoretically deep ensembles are NOT a proper Bayesian posterior (they don't sample from any explicit prior). But empirically they MATCH OR BEAT proper Bayesian methods (HMC, variational inference) on most UQ benchmarks at a fraction of the implementation effort. For PINNs specifically, an ensemble of N=8-32 replicas with random inits is the production sweet spot.

Why ensembles work despite not being Bayesian

The intuition: each random init places the network in a different basin of the loss landscape. SGD finds a different local minimum for each replica. The CHARACTERISTICS the replicas agree on (data fit, physics constraint) are enforced by the loss. The CHARACTERISTICS where they disagree are unconstrained — exactly the regions where genuine uncertainty exists. Wilson-Izmailov 2020 showed that this explores the loss-landscape MULTIMODALITY that single-mode VI methods miss.

Try it: 5-replica ensemble PINN

The widget trains 5 PINN replicas, each fitting the function $u(x) = \sin(2\pi x)$ on $x \in [0, 1]$ from 5 noisy data points plus a Helmholtz-residual constraint:

\mathcal{L}(\theta_i) = \underbrace{\sum_{d=1}^{5} (u_i(x_d) - u_d)^2}_{\text{data}} + \lambda_{\mathrm{pde}} \, \underbrace{\frac{1}{N_c} \sum_k (u_i''(x_k) + (2\pi)^2 u_i(x_k))^2}_{\text{PDE residual}} + \lambda_{\mathrm{bc}} \, \underbrace{u_i(0)^2}_{\text{BC}} .

Two panels:

Ensemble u(x) + 2σ band. Truth in yellow (dashed), ensemble mean in orange (bold), 2σ band in light orange. Individual replicas as faint orange lines. Data points as cyan dots. The 2σ band should narrow at data points and widen between them.
Pointwise σ(x) profile. Plotted directly: where is the ensemble most confident, where is it least? Vertical cyan dashed lines mark the data x-locations. Production: this profile flags WHICH x-regions need more measurements to reduce uncertainty.

Expected behaviour: σ(x) shows clear minima at the data locations (where all 5 replicas anchor to the observed values) and maxima between them (where physics + BC are the only constraints, and replicas explore different consistent solutions). The summary reports the ratio σ(gap) / σ(at-data) — typically 2-5× for well-trained ensembles.

Other Bayesian-PINN approaches

Deep ensembles is the simplest and most empirically robust UQ method. Three alternatives appear in the PINN literature:

Bayes-by-backprop (BBB) (Blundell et al 2015). Each weight has a learned mean μ and variance σ²; sampling and KL-regularisation as in VAE. Per-step cost is 2× larger than deterministic; total params 2×. Captures epistemic uncertainty cleanly. PINN variants in Yang-Karniadakis 2021.
Monte-Carlo dropout (Gal-Ghahramani 2016). Train with dropout; at inference, KEEP dropout active and sample N predictions. Approximates a Bayesian posterior under specific theoretical conditions. Cheap (1× train, N× inference) but uncertainty estimates are biased low.
Hamiltonian Monte Carlo (HMC) applied to network weights (Neal 1995, modern revival in Izmailov-Vikram-Hoffman-Wilson 2021). Gold-standard posterior sampling but extremely expensive — typically 50-1000× slower than ensemble training. Only feasible on small networks. PINN variant is what underlies the §9.6 production UQ workflow.
Stein variational gradient descent (SVGD) (Liu-Wang 2016). Train N "particles" (independent networks) jointly with a kernelised gradient that pushes them to span the posterior. Combines deep-ensemble robustness with Bayesian theoretical grounding. HypoSVI (§7.5) uses SVGD for hypocentre posterior.

What §9.6 will do

§9.6 is the capstone: combine (a) generative prior on velocity (§9.4) + (b) ensemble PINN forward model (§9.5) + (c) data with picking noise → posterior sampling on velocity model. Output is a velocity model AND an uncertainty map at every depth. The thing a production seismic-interpretation team actually needs.

References

Lakshminarayanan, B., Pritzel, A., Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS 2017. The deep-ensembles paper that established empirical SOTA.
Wilson, A.G., Izmailov, P. (2020). Bayesian deep learning and a probabilistic perspective of generalization. NeurIPS 2020. Why deep ensembles work despite not being Bayesian.
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D. (2015). Weight uncertainty in neural networks. ICML 2015. Bayes-by-Backprop.
Yang, L., Meng, X., Karniadakis, G.E. (2021). B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 425, 109913. The Bayesian-PINN paper using HMC and variational inference.
Liu, Q., Wang, D. (2016). Stein variational gradient descent: A general purpose Bayesian inference algorithm. NeurIPS 2016. SVGD foundations.
Izmailov, P., Vikram, S., Hoffman, M.D., Wilson, A.G. (2021). What are Bayesian neural network posteriors really like?. ICML 2021. HMC at scale on neural-net posteriors.