Learned regularisers and ML priors

Part 9 — Hybrid PINN + classical, with uncertainty

Learning objectives

Replace the hand-crafted regulariser of §9.2 with a learned one
Train a small autoencoder on a corpus of plausible velocity models
Use the AE reconstruction loss as the regulariser term
Visualise the learned 2-D bottleneck space
Recognise where learned priors beat hand-crafted ones (and vice-versa)

§9.2 used a hand-crafted Tikhonov-style regulariser: penalise velocity inversions and pull toward a prescribed contrast. That works when the prior structure is simple enough to write down in closed form. For real-world velocity models — folded sediments, salt domes, faulted blocks — hand-crafted priors quickly run out of expressivity. §9.3 replaces them with LEARNED priors: train a small neural network on a corpus of plausible velocity models, use its output as the regulariser.

Autoencoder reconstruction loss as a prior

An AUTOENCODER (AE) is a neural network with a bottleneck. The encoder $E_\theta : \mathcal{P} \to \mathcal{Z}$ maps a parameter vector $p \in \mathcal{P}$ to a low-dimensional code $z \in \mathcal{Z}$ . The decoder $D_\phi : \mathcal{Z} \to \mathcal{P}$ maps the code back. Training minimises the reconstruction error

\mathcal{L}_{\mathrm{AE}}(\theta, \phi) = \mathbb{E}_{p \sim \mathcal{D}} \bigl[ \| D_\phi(E_\theta(p)) - p \|^2 \bigr]

over a corpus $\mathcal{D}$ of plausible parameter triples. After training, the reconstruction error $| AE(p) - p |^2$ acts as a soft membership function: low for $p \in \mathcal{D}$ , high otherwise. The AE has implicitly learned the manifold of plausible models.

For the inversion, we use this as the regulariser:

\mathcal{L}(\theta) = \underbrace{\sum_{s, r} (T_{\mathrm{pred}} - t_{\mathrm{obs}})^2}_{\mathcal{L}_{\mathrm{data}}} + \lambda \, \underbrace{\| AE(p(\theta)) - p(\theta) \|^2}_{\mathcal{L}_{\mathrm{AE}}} .

Gradient descent on $\theta$ produces parameters $p$ that fit the data AND lie close to the corpus manifold.

Why this often beats hand-crafted

The §9.2 hand-crafted regulariser encoded one specific assumption — depth-trend with $\Delta v_{\mathrm{prior}} = 1.5$ km/s. If the true structure deviates from this assumption, the regulariser fights the data. The AE has no such bias: it captures whatever structure the corpus contains, including:

Multi-modal priors. Two sediment-basin classes with different compaction trends; AE handles both as separate clusters in the bottleneck. Hand-crafted priors typically choose one mode.
Non-linear correlations. Real velocity-density correlations are non-linear (Gardner relation, lithology effects). AE learns the actual surface; hand-crafted priors assume linearity.
Higher-dimensional structure. For full 2-D / 3-D velocity models, the prior is a function on millions of pixels. No hand-crafted prior can capture this; AE / generative priors can.

Try it

The widget runs a complete pipeline:

Generate corpus. 200 random parameter triples $(z_{\mathrm{int}}, v_1, v_2)$ with $z_{\mathrm{int}} \sim U(0.3, 0.7)$ , $v_1 \sim U(1.0, 2.0)$ , and $v_2 \sim \mathcal{N}(2 v_1, 0.18^2)$ clamped to $[v_1 + 0.2, 4.0]$ . The structural relation $v_2 \approx 2 v_1$ is the "compaction trend" the AE should learn, and it places the truth $(v_1 = 1.5, v_2 = 3.0)$ ON the corpus manifold so the AE recon error has its minimum at truth.
Pretrain AE. Encoder R³ → R² → decoder R² → R³ with two hidden Tanh layers each. 500 epochs of mini-batch SGD (~5-10 s in browser).
Visualise bottleneck. Encode all 200 corpus triples; scatter-plot in the 2-D bottleneck space. The AE has learned a 2-D manifold capturing the corpus's effective degrees of freedom.
Run 3 inversions. Same bad initial guess $(0.3, 2.0, 2.0)$ as §9.1/§9.2. Compare classical (λ=0), hand-crafted (§9.2 prior, λ=1), AE-regularised (this section, λ=1.5).
Compare trajectories. Plot all three trajectories in the $(v_1, v_2)$ slice with the corpus dots overlaid as the "prior cloud". Final parameter L² errors quantify which method recovered truth most accurately.

Two panels: the $(v_1, v_2)$ slice with corpus + truth + init + 3 trajectories, and the AE bottleneck space showing the encoded corpus.

Expected behaviour: AE-regularised typically matches or beats the hand-crafted prior for this 2-layer toy problem. The AE has implicitly learned the v_2 ≈ 1.5 v_1 + 0.5 trend that the hand-crafted prior also encodes — but the AE generalises to richer corpora where hand-crafted forms fail.

Caveats and limitations

Corpus quality matters. The AE only encodes what's in the corpus. If the corpus is biased (e.g., all from one sedimentary basin), the prior will be biased the same way and may reject perfectly valid out-of-distribution velocity models.
Bottleneck dimension is a hyperparameter. Too small = under-fit corpus (high reconstruction error even on training data). Too large = over-fit (AE behaves like identity, no useful prior). For 2-layer toy this is K=2; for full 2-D Marmousi-class velocity models, K=64-256 is typical.
Reconstruction loss ≠ likelihood. The AE reconstruction error is NOT a proper probability density. For Bayesian inference (§9.5-§9.6) we typically need a proper likelihood, which the §9.4 generative-prior architectures (VAE, diffusion) provide directly.
Mode collapse. AEs can fail to capture all corpus modes. If your corpus has two distinct sedimentary regimes, a small AE may collapse them to a single mode in the bottleneck. Diagnostic: check reconstruction error per corpus point; high-error points likely belong to a missed mode.

What §9.4 will do

§9.4 generalises beyond AEs to GENERATIVE priors. Variational autoencoders (VAE) provide a proper likelihood $p_\phi(p) = \int p_\phi(p|z) p(z) , dz$ . Denoising-diffusion models provide a score function $\nabla_p \log p(p)$ that can be used as a regulariser gradient directly. For 2-D / 3-D velocity models with thousands of parameters, generative priors are the production state of the art.

References

Hinton, G.E., Salakhutdinov, R.R. (2006). Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507. The autoencoder paper that started the modern interest.
Bora, A., Jalal, A., Price, E., Dimakis, A.G. (2017). Compressed sensing using generative models. ICML 2017. Showed that generative priors yield reconstructions superior to hand-crafted regularisers in compressed sensing — the inverse-problem analogue.
Mosser, L., Dubrule, O., Blunt, M.J. (2020). Stochastic seismic waveform inversion using generative adversarial networks as a geological prior. Math. Geosci. 52, 53–79. Generative-prior FWI on real seismic data.
Asgharzadeh, M., Sansò, S., Fomel, S. (2023). Deep prior-driven inversion of seismic data. Geophysics 88(5), R547–R558. Deep-image-prior method for FWI regularisation.