Generative priors for velocity models

Part 9 — Hybrid PINN + classical, with uncertainty

Learning objectives

Recognise the conceptual jump from AE-as-regulariser to AE-as-sampler
Train an AE with a latent-space prior penalty so latents concentrate near origin
Draw new plausible velocity models by sampling z ~ N(0, σ²·I) and decoding
Identify the bridge to full VAE / diffusion priors used in production
Set up §9.5 (Bayesian PINNs need a proper likelihood — generative priors deliver one)

§9.3's autoencoder gave us a regulariser: the reconstruction loss $|AE(p) - p|^2$ acts as a soft membership function on the corpus manifold. §9.4 takes the next step: turn the AE into a SAMPLER that can generate NEW plausible velocity models on demand. This unlocks two things — a proper probabilistic prior for Bayesian inversion (§9.5-§9.6), and continuous parameter exploration along the prior manifold.

The latent-prior trick

A vanilla AE learns to reconstruct its corpus, but the latent codes $z = E(p)$ can sit anywhere in $\mathbb{R}^K$ . This means decoding a RANDOM latent $z_{\mathrm{new}} \sim p(z)$ for some chosen prior $p(z)$ may produce garbage — the random $z_{\mathrm{new}}$ likely falls in a region the decoder was never trained on.

The fix is to train the AE so its latent codes are CONCENTRATED in a known region. Add a penalty during training:

\mathcal{L}(\theta_E, \theta_D) = \| D(E(p)) - p \|^2 + \beta \, \| E(p) \|^2 ,

where $\beta > 0$ pushes the latent codes toward the origin. After training, encoded corpus codes have empirical mean $\mu_z \approx 0$ and empirical std $\sigma_z$ . Drawing $z_{\mathrm{new}} \sim \mathcal{N}(0, \sigma_z^2 \cdot I)$ and decoding now gives samples that LOOK LIKE corpus models, because the decoder has seen latent codes in this region throughout training.

This is the simplest non-trivial generative model. Two important refinements lead to production architectures:

Variational AE (VAE). The encoder outputs $(\mu, \log \sigma^2)$ instead of a single point $z$ . Latent samples are drawn as $z = \mu + \sigma \cdot \varepsilon$ with $\varepsilon \sim \mathcal{N}(0, I)$ (the reparameterisation trick). The KL-divergence $\mathrm{KL}\bigl(\mathcal{N}(\mu, \sigma^2) | \mathcal{N}(0, 1)\bigr)$ replaces the $\beta |z|^2$ penalty. Result: a proper probabilistic generative model with a likelihood $p_\phi(p)$ usable in Bayesian inference. Kingma-Welling 2013.
Diffusion models. Train a network to PREDICT the noise added at each level of a forward Gaussian-noise process. The score function $\nabla_p \log p(p)$ is recovered as a side effect and can be used directly as a regulariser gradient. Far more expressive than VAE for high-dimensional images / 2-D velocity models. Ho-Jain-Abbeel 2020 DDPM, Song-Ermon 2019 score matching.

Try it: generative-AE playground

The widget pretrains a regularised AE ( $\beta = 0.05$ ) on the same 200-model corpus from §9.3 for 700 epochs. After pretraining:

Sampling: 100 random latent codes are drawn from $\mathcal{N}(0, \sigma_{\mathrm{emp}}^2 \cdot I)$ where $\sigma_{\mathrm{emp}}$ is the empirical std of corpus latents. Each is decoded to a parameter triple. Plotted alongside the corpus to demonstrate that the GENERATED samples follow the same v₂ ≈ 2v₁ trend.
Latent-space exploration: two sliders (z₁, z₂) let you traverse the latent manifold. Every position decodes INSTANTLY to a velocity profile c(z). Move the sliders — the velocity profile updates continuously. This is the central capability of generative priors: bounded, on-manifold parameter exploration.

Three panels: (v_1, v_2) parameter space with corpus + generated samples + slider position; the latent space with corpus encoded + slider position; the decoded velocity profile c(z) for the current sliders.

Expected behaviour: the generated samples (orange dots) overlap the corpus (cyan) in the $(v_1, v_2)$ plane, demonstrating that the AE-as-sampler reproduces the training distribution. Sliding through latent space traces out smooth, plausible velocity profiles — the prior manifold made interactive.

Why this matters for Bayesian FWI

Bayesian FWI computes a POSTERIOR over velocity models given seismic data:

p(c \mid \mathrm{data}) \propto p(\mathrm{data} \mid c) \, p(c) ,

where $p(c)$ is the prior over velocity models. Hand-crafted priors (Tikhonov, total variation) provide closed-form $p(c)$ but limited expressivity. Generative priors provide a learned, expressive $p(c)$ via the decoder + latent prior:

p(c) = \int p(c \mid z) \, p(z) \, dz \approx \frac{1}{N} \sum_{i=1}^{N} p(c \mid z_i), \quad z_i \sim p(z) .

Sampling from the posterior $p(c \mid \mathrm{data})$ via MCMC then becomes computationally tractable: each Markov-chain step proposes a new velocity model by perturbing $z$ and decoding. The proposal automatically lies on the prior manifold, dramatically improving acceptance rates compared to proposing in the high-dimensional ambient space.

Production examples

Mosser et al 2020 trained a GAN on Marmousi-style velocity models, then used the generator as the prior for stochastic FWI. Posterior samples were drawn by perturbing the GAN latent code while satisfying the data likelihood.
Liu et al 2023 use diffusion priors for posterior FWI, leveraging the score function for gradient-based posterior sampling. State of the art on synthetic and real-data tests.
Asgharzadeh et al 2023 use deep image priors (UNet trained on the data itself, no external corpus) — a degenerate case where the "generative prior" is implicit in the network architecture.

Limitations

Out-of-distribution velocity models: if the true subsurface contains structures (e.g., salt domes) absent from the training corpus, the generative prior will REJECT them. Production codes use diverse, geographically broad corpora to mitigate.
Mode collapse: GANs in particular often miss corpus modes. VAEs tend to capture all modes but may be biased toward the centroid. Diffusion models cover modes well but cost more compute.
Latent-space topology: the latent space is not Euclidean in any meaningful sense. Linear interpolation in z does NOT correspond to linear interpolation in c(x). Geodesics on the latent manifold require Riemannian-metric machinery for proper Bayesian inference (Arvanitidis et al 2018).

What §9.5 will do

§9.5 introduces UNCERTAINTY in the PINN itself: weight-space Bayesian PINNs and ensemble-based UQ. We have priors over velocity models (§9.4); now we add priors over PINN parameters and produce posterior samples. §9.6 combines all of it: uncertainty in data, uncertainty in PINN, generative prior on velocity, posterior FWI inversion.

References

Kingma, D.P., Welling, M. (2013). Auto-encoding variational Bayes. ICLR 2014. arXiv:1312.6114. The VAE paper.
Ho, J., Jain, A., Abbeel, P. (2020). Denoising diffusion probabilistic models. NeurIPS 2020. The DDPM paper that brought diffusion models to image generation.
Song, Y., Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. NeurIPS 2019. Score-matching foundations of diffusion priors.
Mosser, L., Dubrule, O., Blunt, M.J. (2020). Stochastic seismic waveform inversion using generative adversarial networks as a geological prior. Math. Geosci. 52, 53–79.
Liu, Z., Yang, Y., Quan, Y., Yang, Y., Wang, B. (2023). Diffusion-prior-based seismic full waveform inversion. arXiv:2306.10094. Production diffusion-prior FWI.
Arvanitidis, G., Hansen, L.K., Hauberg, S. (2018). Latent space oddity: on the curvature of deep generative models. ICLR 2018. Riemannian geometry of VAE latent spaces.