1D Burgers’ equation, end to end

Part 1 — The PINN formulation

Learning objectives

Solve the canonical Raissi 2017 PINN test case live in your browser — 1D viscous Burgers with the standard ν = 0.01/π
Read the heatmap of u(x, t) and identify the shock formation near t ≈ 0.4 and its diffusion thereafter
Match the loss-component traces (IC, BC, PDE) to the fit you see and recognise the typical training signature of a working PINN
Have a working baseline that §1.4 (loss-weight balance) and §1.5 (hard constraints) will improve on

The 1D viscous Burgers equation is the standard test case for PINNs because it is the smallest PDE that exhibits both nonlinear advection and diffusion, and because its solution develops a shock-like steep gradient that ordinary supervised learning would never recover. Raissi, Perdikaris and Karniadakis used it as their headline demo in the 2017 preprint and the 2019 JCP paper. We will use the exact same test case.

The problem

\frac{\partial u}{\partial t} + u\,\frac{\partial u}{\partial x} = \nu\,\frac{\partial^2 u}{\partial x^2}, \qquad x \in [-1, 1],\ t \in [0, 1].

u(x, 0) = -\sin(\pi x), \qquad u(-1, t) = u(1, t) = 0.

The viscosity is $\nu = 0.01 / \pi \approx 3.18 \times 10^{-3}$ — small enough that the nonlinear advection term $u \partial u / \partial x$ steepens the initial sine into a near-vertical front near $x = 0$ before viscosity catches up. The eventual shock is what makes Burgers a stress test: any forward-modeller has to handle the discontinuous-looking gradient, and any backward-trainable network has to fit it without oscillation.

The PINN loss

Define the residual at any interior point $(x, t)$ :

R(x, t) = u_t(x, t) + u(x, t)\,u_x(x, t) - \nu\,u_{xx}(x, t).

The PINN loss is then a weighted sum of three mean-squared errors:

L = \lambda_{\mathrm{IC}}\,\frac{1}{N_{\mathrm{IC}}}\sum_i (u(x_i, 0) + \sin(\pi x_i))^2 \;+\; \lambda_{\mathrm{BC}}\,\frac{1}{N_{\mathrm{BC}}}\sum_j (u(\pm 1, t_j))^2 \;+\; \lambda_{\mathrm{PDE}}\,\frac{1}{N_C}\sum_k R(x_k, t_k)^2.

The collocation points $(x_k, t_k)$ are sampled quasi-randomly across the interior. The widget uses 50 IC points (linspace along $t = 0$ ), 50 BC points (25 each on $x = \pm 1$ ), and 80 interior collocation points. The architecture is 2 → 32 → 32 → 32 → 1 with Tanh hidden activations.

Try it

Press Play. The heatmap shows $u(x, t)$ over the full domain; cool blue is negative, warm orange is positive. The snapshot panel below renders five time slices of the predicted solution. The loss panel at the bottom shows total + per-component traces.

What to watch for

The IC and BC losses drop fast. Within ~50 epochs the bottom edge of the heatmap shows a clean $-\sin(\pi x)$ profile and the left and right edges sit near zero. These are easy targets — they are just regression on point values.
The PDE-residual loss takes longer. The blue trace lags the green and violet traces by a couple of orders of magnitude in the early phase. This is normal PINN training. §1.4 will explore the loss-weight rebalancing tricks invented to address it.
The shock forms. Around 600 epochs (live in your browser) the $t = 0.4$ snapshot will steepen into a near-vertical line at $x = 0$ . The pre-shock side is positive, the post-shock side is negative, and the centre crosses through zero with a near-infinite slope (limited by the viscosity).
The shock spreads. Past $t \approx 0.5$ , viscosity dominates and the front rounds off. By $t = 1$ the profile is much smoother than at $t = 0.4$ .

The runtime, in one paragraph

For each interior collocation point $(x_k, t_k)$ the widget calls forwardDerivs([x_k, t_k]). The runtime returns $u$ , $\partial u/\partial x$ , $\partial u/\partial t$ , and $\partial^2 u/\partial x^2$ via forward-mode AD. The widget computes the residual $R = u_t + u u_x - \nu u_{xx}$ as plain arithmetic. The chain rule then says dL/du = 2 R u_x / N, dL/du_x = 2 R u / N, dL/du_t = 2 R / N, dL/du_{xx} = -2 R \nu / N. These are passed to backwardDerivs, which propagates the loss gradient back to the network parameters via the same augmented-forward graph. The widget accumulates these contributions across all collocation points and adds the IC and BC term gradients before taking an Adam step. That is the entire algorithm.

Observations and follow-ups

You will notice that the PDE-residual loss does not drop to zero — typically it plateaus at $\sim 10^{-3}$ to $10^{-4}$ for the canonical setup. This is the cost of representing the steep shock with a smooth Tanh network: the residual is small almost everywhere except in a thin region around $x = 0, t \approx 0.4$ , where the network does not perfectly resolve the steep gradient. §1.4 explores how to rebalance the loss weights to push the PDE residual lower; §1.5 introduces hard-constraint architectures that automatically satisfy the IC and BC, freeing all of the network capacity for the residual.

Pause-and-check. (1) The IC loss is initialised at roughly 0.4 — a typical MSE between a random network and $-\sin(\pi x)$ . The PDE loss is much smaller initially. Why is that, and is the small initial PDE loss good news? (2) Increase ν (drag the viscosity slider to 0.05 and reset). What happens to the shock? Why? (3) Decrease ν toward 0.001 and try again. Why does the network train more slowly?

References

Raissi, M., Perdikaris, P., Karniadakis, G.E. (2019). Physics-informed neural networks. J. Comput. Phys. 378, 686–707.
Raissi, M., Perdikaris, P., Karniadakis, G.E. (2017). Physics informed deep learning (Part I): Data-driven solutions of nonlinear partial differential equations. arXiv:1711.10561.
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E. (2021). DeepXDE. SIAM Review 63(1), 208–228.
Wang, S., Yu, X., Perdikaris, P. (2022). When and why PINNs fail to train: A neural tangent kernel perspective. J. Comput. Phys. 449, 110768.