Picking an architecture: a decision tree
Learning objectives
- Synthesise the architectural choices of §2.1–§2.5 into a one-page decision tree
- Recognise that the right choice depends on four orthogonal properties of the problem: frequency content, PDE order, geometry, data availability
- Pick a starting architecture for any new PINN problem from first principles, then iterate based on training observations
- Have a navigational capstone for Part 2 that closes the architectural conversation
Part 2 has surveyed the architecture vocabulary of modern PINNs: vanilla MLP (§2.1), Fourier features (§2.2), SIREN (§2.3), hard-constrained ansatz (§2.4), and multi-scale Fourier (§2.5). Each one fixes a specific problem; each one fails on others. This closing section synthesises the menu into an interactive decision tree. The widget below asks four orthogonal questions about your PINN problem and suggests an architecture from first principles.
The four questions
- What is the spatial frequency content of the target? Smooth, single-high, multi-scale, or unknown — each picks a different backbone (vanilla, SIREN/Fourier, multi-scale Fourier, broad multi-scale).
- What is the highest order derivative in the PDE residual? First-order is gentle; second-order rules out ReLU; higher-order forces smooth saturating activations and a smaller learning rate.
- What does the boundary look like? Rectangular geometry → hard-constraint reparameterisation; complicated geometry → soft enforcement plus the Part 3 weight-balancing toolkit.
- What data do you have, and what are you solving for? Forward, sparse-inverse, and data-rich-inverse problems each need a different loss-weighting pattern.
Try it
The widget has two tabs:
- Decision tree: walk through the four questions for a problem you care about. The widget synthesises a recommendation across four categories — backbone, activation, BC enforcement, loss structure — with a one-line rationale and a cross-reference to the section that develops each recommendation in depth. Click any chip to revise an earlier answer.
- Playground: pick a target and race two architectures live, side by side. The decision-tree gives you the recommended architecture; the playground lets you verify it by training the recommended architecture against a strawman on the same data. The visceral side-by-side comparison is what builds intuition. Suggested experiments are baked in — click any one of them to instantly set up the matchup.
The intended workflow: ask the decision tree for a recommendation, then jump to the playground and race the recommendation against the architecture you would have picked by intuition. If they tie, the intuition was right; if they differ, the playground shows you why.
The architecture decisions are not independent
The widget treats the four questions as orthogonal because that is the simplest mental model. In practice they interact:
- A high-frequency target (Q1) tends to come with a second-order PDE residual (Q2): the wave equation is the canonical example. The combination forces SIREN or Fourier features (Q1) AND a smooth saturating activation (Q2).
- A complicated boundary (Q3) makes hard-constraint enforcement infeasible, which then puts more pressure on the loss-balance machinery — Part 3 §3.3 and §3.4 become essential, not optional.
- An inverse problem (Q4) with sparse data on a high-frequency wavefield is the hardest combination. Multi-scale Fourier (or SIREN) plus soft enforcement plus carefully tuned loss weights is the typical recipe; Parts 3 and 6 cover the engineering.
Three rules that override the decision tree
- Start simple. Even when the decision tree says "multi-scale Fourier", a simple vanilla MLP is the fastest baseline to get up and running. Run it for 1000 epochs and see what it gets wrong; let those failures drive the architecture choice for the next iteration.
- Validate with a known-solution test. Whatever architecture you pick, validate it first on a simplified version of your problem with a known analytic solution. The relative-L2 error against the known answer is the only metric that does not lie.
- The architecture is the second question, not the first. The first question is always: is the loss formulation correct? Is the residual computation right? Are the BC and IC terms in the loss? An expensive architecture will not save a wrong loss.
What you now know about architecture
You can pick a starting architecture for any 1D or 2D PINN problem from first principles. You can defend the choice in one sentence per architectural decision. You can recognise when training failure indicates an architecture mismatch (versus a loss-balance issue, which is Part 3, or a sampling issue, also Part 3). Part 3 turns to the training pathologies that even the right architecture cannot escape on its own — and the modern toolbox of fixes that has emerged since 2021.
Expertise checkpoint — end of Part 2
You should now be able to:
- Map a PINN problem onto the four orthogonal axes (frequency, PDE order, geometry, data availability) and pick an architecture from first principles for each combination.
- Defend choosing Tanh over ReLU on PDE-residual grounds.
- Defend choosing Fourier features over SIREN, or vice versa, on a specific problem.
- Construct the hard-constraint ansatz for a two-point Dirichlet BVP on any rectangular interval.
- Recognise when a problem is multi-scale and choose multi-scale Fourier features over a single-scale embedding.
- Critique the architecture choices in any current PINN paper from arXiv on first-principles grounds.
Pause-and-check. (1) Use the widget to pick an architecture for a 2D acoustic-wave forward problem on a rectangular domain with 30 Hz frequency content and zero initial wavefield. Does the recommendation match what you would have chosen by intuition? (2) Same problem but now an inverse problem (recover v(x, z) from sparse surface data). What changes in the recommendation, and why? (3) Pick a PINN paper from the recent literature (e.g. 2024 Geophysics) and check whether its architecture matches what the decision tree would have recommended for its problem. If not, what extra information justifies the deviation?
References
- Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L. (2021). Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440.
- Cuomo, S., Di Cola, V.S., Giampaolo, F., et al. (2022). Scientific machine learning through physics-informed neural networks: Where we are and what is next. J. Sci. Comput. 92(3), 88.
- Wang, S., Wang, H., Perdikaris, P. (2021). On the eigenvector bias of Fourier feature networks. CMAME 384, 113938.
- Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G. (2020). Implicit neural representations with periodic activation functions (SIREN). NeurIPS.