The forward problem vs the inverse problem

Part 1, The PINN formulation

Learning objectives

Distinguish forward problems (given coefficients, find u) from inverse problems (given observations of u, find coefficients)
See concretely that the same PINN architecture handles both, with the inverse problem promoting unknown coefficients to additional trainable scalars
Compute the gradient of the PINN loss with respect to a non-network parameter such as a PDE coefficient
Anticipate why this generalises directly to PINN-FWI: instead of one scalar k, we have a velocity field v(x) and the same gradient mechanism recovers it

So far in Part 1, every PINN has been solving a forward problem: given the PDE coefficients (ν in Burgers, ω in the harmonic oscillator, k in the ODE), find the function u that satisfies the equation. That is one of the two PINN problem types. The other is the inverse problem: given (sparse, noisy) observations of u, find both u and the unknown PDE coefficients that explain the observations.

Inverse problems are where PINNs really earn their place in seismology. Full-waveform inversion (FWI), velocity-model building, microseismic source localisation, all of these are inverse problems where the unknown is the PDE's coefficient field, and the data is sparse, noisy, and limited to the surface. Classical FWI handles these by adjoint methods. PINN-FWI handles them by promoting the unknown coefficients to extra trainable parameters, optimised jointly with the network weights through the same gradient descent.

The inverse problem on the simplest possible PDE

Same ODE as §1.2: $u'(t) = -k,u(t)$ , $u(0) = 1$ . But now k is unknown, and we have access to six noisy observations of $u(t)$ on $[0, T]$ . The PINN treats $k$ as a trainable scalar parameter alongside the network weights. The loss has three terms:

L = \lambda_d\,\frac{1}{N_d}\sum_i (u_\theta(t_i) - y_i)^2 \;+\; \lambda_{\mathrm{IC}}\,(u_\theta(0) - 1)^2 \;+\; \lambda_p\,\frac{1}{N_c}\sum_j (u_\theta'(t_j) + k\,u_\theta(t_j))^2.

Note where $k$ appears: only in the PDE residual term. The data term has no $k$ in it; the data can be matched by infinitely many (k, u) pairs. The PDE residual is what selects the correct one. Without the PDE residual, the inverse problem is impossible: the data alone is consistent with any decay rate plus a network that interpolates the points. With the PDE residual, the optimiser has to pick $(k, u_\theta)$ such that they jointly satisfy the data and the dynamics.

The gradient with respect to k

The gradient of the loss with respect to $k$ is straightforward. Only the PDE residual depends on $k$ :

\frac{\partial L_{\mathrm{PDE}}}{\partial k} = \frac{2}{N_c}\sum_j R(t_j)\,\frac{\partial R}{\partial k}\Big|_{t_j} = \frac{2}{N_c}\sum_j R(t_j)\,u_\theta(t_j).

This is a single scalar, computed at the same time as the network parameter gradients. The widget runs an Adam update on $k$ in parallel with the Adam update on the network weights, with separate learning rates. That is the only addition to the forward-problem PINN of §1.2.

Try it

Six noisy data points at $k_{\mathrm{true}} = 1.0$ . Initial guess $k_{\mathrm{init}} = 0.3$ (the violet dashed curve in the figure starts well above the truth). Press Play and watch the violet dashed curve descend toward the gray truth as the PDE residual pulls $k_{\mathrm{est}}$ toward the right value. Within ~500 epochs the violet and gray curves coincide, $k_{\mathrm{est}} \approx 1.0$ .

What happens if you turn off the PDE term

If you set $\lambda_p = 0$ (effectively, by sliding the PDE-related component out via a manual hack), the inverse problem becomes ill-posed: any $k$ paired with a sufficiently flexible network can match the six noisy points. The optimiser will recover some network output, but the $k$ value will drift toward whatever was most computationally convenient given the initialisation. With the PDE term active, the optimisation converges deterministically to the true $k$ regardless of the starting guess, within reason.

The bridge to seismic inverse problems

This same recipe scales up directly to the central inverse problem of seismology: full-waveform inversion. Replace the scalar $k$ with a spatial velocity field $v(\mathbf{x})$ . Replace the ODE residual with the wave-equation residual $\partial^2 u / \partial t^2 - v(\mathbf{x})^2 \nabla^2 u = 0$ . Replace six points with surface seismograms. The architecture jointly trains a wavefield network $u_\theta(\mathbf{x}, t)$ and a velocity-model network $v_\phi(\mathbf{x})$ . The PDE residual is the constraint that ties the two networks together. Part 6 dedicates seven sections to this; §6.2 is the 1D version of the inverse problem you are looking at right now.

Why this is hard, and why it works anyway

Inverse PDE problems are ill-posed in the classical sense: small changes in the data can correspond to large changes in the inferred coefficient. PINNs are not a magic remedy; they inherit this ill-posedness. What they offer is a single optimisation framework where the coefficient and the solution are recovered together, the data and the physics are weighted on the same loss surface, and the entire computation is differentiable end-to-end. Whether the recovered coefficient is correct depends on the same conditions classical inversion needs (sufficient data, well-chosen regularisation, sensible initialisation). PINNs do not eliminate those concerns, they just give you a unified machinery to address them.

Pause-and-check. (1) Increase the noise (slide σ up to 0.05) and reinit. Does the recovered k still equal k_true? Within how many percent? (2) Bump k_true to 2.5 and reinit (without changing k_init). Does the optimiser still recover the correct value? (3) For a real seismic inverse problem with $v(\mathbf{x})$ varying in space, what plays the role of the scalar $k$ here, and what plays the role of the six noisy data points?

References

Raissi, M., Perdikaris, P., Karniadakis, G.E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear PDEs. J. Comput. Phys. 378, 686-707.
Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM.
Plessix, R.-E. (2006). A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 167, 495-503.
Virieux, J., Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics 74(6), WCC1-WCC26.