Generative priors for velocity models

Part 9 — Hybrid PINN + classical, with uncertainty

Learning objectives

  • Recognise the conceptual jump from AE-as-regulariser to AE-as-sampler
  • Train an AE with a latent-space prior penalty so latents concentrate near origin
  • Draw new plausible velocity models by sampling z ~ N(0, σ²·I) and decoding
  • Identify the bridge to full VAE / diffusion priors used in production
  • Set up §9.5 (Bayesian PINNs need a proper likelihood — generative priors deliver one)

§9.3's autoencoder gave us a regulariser: the reconstruction loss AE(p)p2|AE(p) - p|^2 acts as a soft membership function on the corpus manifold. §9.4 takes the next step: turn the AE into a SAMPLER that can generate NEW plausible velocity models on demand. This unlocks two things — a proper probabilistic prior for Bayesian inversion (§9.5-§9.6), and continuous parameter exploration along the prior manifold.

The latent-prior trick

A vanilla AE learns to reconstruct its corpus, but the latent codes z=E(p)z = E(p) can sit anywhere in RK\mathbb{R}^K. This means decoding a RANDOM latent znewp(z)z_{\mathrm{new}} \sim p(z) for some chosen prior p(z)p(z) may produce garbage — the random znewz_{\mathrm{new}} likely falls in a region the decoder was never trained on.

The fix is to train the AE so its latent codes are CONCENTRATED in a known region. Add a penalty during training:

L(θE,θD)=D(E(p))p2+βE(p)2,\mathcal{L}(\theta_E, \theta_D) = \| D(E(p)) - p \|^2 + \beta \, \| E(p) \|^2 ,

where β>0\beta > 0 pushes the latent codes toward the origin. After training, encoded corpus codes have empirical mean μz0\mu_z \approx 0 and empirical std σz\sigma_z. Drawing znewN(0,σz2I)z_{\mathrm{new}} \sim \mathcal{N}(0, \sigma_z^2 \cdot I) and decoding now gives samples that LOOK LIKE corpus models, because the decoder has seen latent codes in this region throughout training.

This is the simplest non-trivial generative model. Two important refinements lead to production architectures:

  • Variational AE (VAE). The encoder outputs (μ,logσ2)(\mu, \log \sigma^2) instead of a single point zz. Latent samples are drawn as z=μ+σεz = \mu + \sigma \cdot \varepsilon with εN(0,I)\varepsilon \sim \mathcal{N}(0, I) (the reparameterisation trick). The KL-divergence KL(N(μ,σ2)N(0,1))\mathrm{KL}\bigl(\mathcal{N}(\mu, \sigma^2) | \mathcal{N}(0, 1)\bigr) replaces the βz2\beta |z|^2 penalty. Result: a proper probabilistic generative model with a likelihood pϕ(p)p_\phi(p) usable in Bayesian inference. Kingma-Welling 2013.
  • Diffusion models. Train a network to PREDICT the noise added at each level of a forward Gaussian-noise process. The score function plogp(p)\nabla_p \log p(p) is recovered as a side effect and can be used directly as a regulariser gradient. Far more expressive than VAE for high-dimensional images / 2-D velocity models. Ho-Jain-Abbeel 2020 DDPM, Song-Ermon 2019 score matching.

Try it: generative-AE playground

Generative PriorInteractive figure — enable JavaScript to interact.

The widget pretrains a regularised AE (β=0.05\beta = 0.05) on the same 200-model corpus from §9.3 for 700 epochs. After pretraining:

  • Sampling: 100 random latent codes are drawn from N(0,σemp2I)\mathcal{N}(0, \sigma_{\mathrm{emp}}^2 \cdot I) where σemp\sigma_{\mathrm{emp}} is the empirical std of corpus latents. Each is decoded to a parameter triple. Plotted alongside the corpus to demonstrate that the GENERATED samples follow the same v₂ ≈ 2v₁ trend.
  • Latent-space exploration: two sliders (z₁, z₂) let you traverse the latent manifold. Every position decodes INSTANTLY to a velocity profile c(z). Move the sliders — the velocity profile updates continuously. This is the central capability of generative priors: bounded, on-manifold parameter exploration.

Three panels: (v_1, v_2) parameter space with corpus + generated samples + slider position; the latent space with corpus encoded + slider position; the decoded velocity profile c(z) for the current sliders.

Expected behaviour: the generated samples (orange dots) overlap the corpus (cyan) in the (v1,v2)(v_1, v_2) plane, demonstrating that the AE-as-sampler reproduces the training distribution. Sliding through latent space traces out smooth, plausible velocity profiles — the prior manifold made interactive.

Why this matters for Bayesian FWI

Bayesian FWI computes a POSTERIOR over velocity models given seismic data:

p(cdata)p(datac)p(c),p(c \mid \mathrm{data}) \propto p(\mathrm{data} \mid c) \, p(c) ,

where p(c)p(c) is the prior over velocity models. Hand-crafted priors (Tikhonov, total variation) provide closed-form p(c)p(c) but limited expressivity. Generative priors provide a learned, expressive p(c)p(c) via the decoder + latent prior:

p(c)=p(cz)p(z)dz1Ni=1Np(czi),zip(z).p(c) = \int p(c \mid z) \, p(z) \, dz \approx \frac{1}{N} \sum_{i=1}^{N} p(c \mid z_i), \quad z_i \sim p(z) .

Sampling from the posterior p(cdata)p(c \mid \mathrm{data}) via MCMC then becomes computationally tractable: each Markov-chain step proposes a new velocity model by perturbing zz and decoding. The proposal automatically lies on the prior manifold, dramatically improving acceptance rates compared to proposing in the high-dimensional ambient space.

Production examples

  • Mosser et al 2020 trained a GAN on Marmousi-style velocity models, then used the generator as the prior for stochastic FWI. Posterior samples were drawn by perturbing the GAN latent code while satisfying the data likelihood.
  • Liu et al 2023 use diffusion priors for posterior FWI, leveraging the score function for gradient-based posterior sampling. State of the art on synthetic and real-data tests.
  • Asgharzadeh et al 2023 use deep image priors (UNet trained on the data itself, no external corpus) — a degenerate case where the "generative prior" is implicit in the network architecture.

Limitations

  • Out-of-distribution velocity models: if the true subsurface contains structures (e.g., salt domes) absent from the training corpus, the generative prior will REJECT them. Production codes use diverse, geographically broad corpora to mitigate.
  • Mode collapse: GANs in particular often miss corpus modes. VAEs tend to capture all modes but may be biased toward the centroid. Diffusion models cover modes well but cost more compute.
  • Latent-space topology: the latent space is not Euclidean in any meaningful sense. Linear interpolation in z does NOT correspond to linear interpolation in c(x). Geodesics on the latent manifold require Riemannian-metric machinery for proper Bayesian inference (Arvanitidis et al 2018).

What §9.5 will do

§9.5 introduces UNCERTAINTY in the PINN itself: weight-space Bayesian PINNs and ensemble-based UQ. We have priors over velocity models (§9.4); now we add priors over PINN parameters and produce posterior samples. §9.6 combines all of it: uncertainty in data, uncertainty in PINN, generative prior on velocity, posterior FWI inversion.

References

  • Kingma, D.P., Welling, M. (2013). Auto-encoding variational Bayes. ICLR 2014. arXiv:1312.6114. The VAE paper.
  • Ho, J., Jain, A., Abbeel, P. (2020). Denoising diffusion probabilistic models. NeurIPS 2020. The DDPM paper that brought diffusion models to image generation.
  • Song, Y., Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. NeurIPS 2019. Score-matching foundations of diffusion priors.
  • Mosser, L., Dubrule, O., Blunt, M.J. (2020). Stochastic seismic waveform inversion using generative adversarial networks as a geological prior. Math. Geosci. 52, 53–79.
  • Liu, Z., Yang, Y., Quan, Y., Yang, Y., Wang, B. (2023). Diffusion-prior-based seismic full waveform inversion. arXiv:2306.10094. Production diffusion-prior FWI.
  • Arvanitidis, G., Hansen, L.K., Hauberg, S. (2018). Latent space oddity: on the curvature of deep generative models. ICLR 2018. Riemannian geometry of VAE latent spaces.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.