Hamiltonian Monte Carlo intuition
Learning objectives
- State HMC's central idea: introduce auxiliary momentum and evolve via Hamiltonian dynamics conserving H = U + K
- Identify the LEAPFROG integrator and its symplectic, time-reversible properties
- Write the HMC Metropolis acceptance ratio that corrects for leapfrog discretisation error
- Recognise NUTS (Hoffman & Gelman 2014) as the auto-tuning HMC variant powering Stan / PyMC / NumPyro
- Diagnose HMC pathologies: divergent transitions, low E-BFMI, funnel geometry
Random-walk Metropolis (§7.3) takes axis-blind diffusive steps; Gibbs (§7.4) takes axis-aligned conditional steps. Both struggle on POSTERIORS WITH STRONG CORRELATION and HIGH DIMENSION because they explore by random walk — covering distance in n steps. HAMILTONIAN MONTE CARLO (Duane et al. 1987, brought to statistics by Neal 1996/2011) uses GRADIENT INFORMATION of the log-posterior to take coherent, long-range moves that follow the posterior's geometry. The result: orders of magnitude better mixing on the hard problems that matter.
The physics analogy
HMC treats Bayesian sampling as a problem in classical mechanics:
- Parameters play the role of POSITION.
- An auxiliary MOMENTUM variable is introduced (independent of θ), with where is a "mass matrix" (usually the identity initially).
- The negative log-posterior is the POTENTIAL ENERGY.
- The KINETIC ENERGY is .
- The HAMILTONIAN governs the dynamics.
The joint distribution is . Marginalising out p recovers π(θ). Hamilton's equations:
Under EXACT Hamiltonian dynamics, H is CONSERVED (the trajectory moves on a level set of H). A trajectory that starts at and evolves for time reaches some with the SAME H. Crucially, the trajectory traces a path that follows the local GEOMETRY of the posterior — naturally moving along contours of equal density.
The leapfrog integrator
Hamilton's equations have no closed-form solution in general; we discretise. The LEAPFROG INTEGRATOR steps in three sub-steps of size :
- Half-step in momentum:
- Full step in position:
- Half-step in momentum:
Iterate this L times to advance the trajectory by . Leapfrog is
- Symplectic: preserves the canonical volume in space exactly.
- Time-reversible: reversing the momentum at the end and running L more steps returns exactly to .
- 2nd-order accurate: error per step is ; total error after T/ε steps is .
These properties are essential for HMC's correctness — they ensure the proposal distribution is symmetric and the trajectory stays close to the true Hamiltonian flow.
The HMC iteration
- Sample new momentum (this resets the kinetic energy each iteration).
- Compute .
- Simulate L leapfrog steps to reach .
- Negate the momentum (to enforce time-reversibility): . (For the acceptance ratio it doesn't matter; the kinetic energy depends only on |p|.)
- Compute .
- Accept with probability . If accepted, set . Else reject and stay at the old θ.
Under exact Hamiltonian dynamics, and the acceptance probability is 1. Leapfrog introduces discretisation error of size ; the Metropolis-Hastings correction recovers detailed balance with . The marginal on θ is exactly π.
The benefit: trajectories follow geometry
Because leapfrog uses the GRADIENT of log π, each step is biased toward higher-density regions while the momentum maintains long-range coherence. The trajectory looks like a smooth arc along the contours of the posterior — exactly what you want for an elongated, correlated target. Random-walk Metropolis on the same target zig-zags in tiny steps, taking steps to traverse a length where HMC takes O(1) trajectories.
The trade-off: each HMC iteration requires L gradient evaluations of log π. For a typical L = 30, that's 30× the per-iteration cost of one Metropolis step — but the effective sample size per iteration is often 100–1000× higher on correlated targets.
Tuning ε and L
ε too small → trajectory barely moves → small effective step size → wasted gradient evaluations. ε too large → leapfrog error blows up → low acceptance rate → wasted compute. L too small → momentum is too quickly reset → behaviour reverts to random walk. L too large → trajectory loops back on itself (U-turn) → wasted compute.
The MODERN auto-tuning algorithm: NUTS (Hoffman & Gelman 2014). NUTS doubles the trajectory length adaptively until detecting a U-turn (where the trajectory starts curving back on itself); the trajectory is then truncated. Combined with dual-averaging adaptation of ε during burn-in, NUTS makes HMC nearly tuning-free. Stan, PyMC, and NumPyro all default to NUTS.
Pathologies and diagnostics
- Divergent transitions: when leapfrog error explodes (often in regions of extreme curvature), the chain's trajectory goes to infinity. Stan reports divergences as warnings. Many divergences = serious problem with the parameterisation (e.g., funnel geometry in hierarchical models). Re-parameterise (e.g., Matthew Hoffman's non-centred parameterisation for hierarchical models).
- E-BFMI (Energy-Bayesian Fraction of Missing Information): a diagnostic for whether the momentum-resampling step is mixing well across energy levels. E-BFMI < 0.3 indicates problems.
- Funnel geometry: hierarchical models with prior and create a funnel that HMC struggles with at the narrow neck. Non-centred parameterisation (sampling ) flattens the funnel.
Try it
- Start with ρ = 0.92, ε = 0.08, L = 30 (the defaults). Click "1 HMC iter" several times. Watch the green arc — the leapfrog trajectory. Each iteration draws a fresh momentum and proposes the trajectory's endpoint. The trajectory FOLLOWS the contour ellipse rather than ignoring it.
- Click "Run 100 HMC iter" once. Then click "Run 100 MH iter". Both samplers have had 100 iterations on the same target. Look at the cloud of points: HMC's blue dots are spread across the ellipse; MH's red dots are clustered near the starting point.
- Compare the empirical correlations in the readout. HMC's empirical correlation should approach the true ρ = 0.92 within 100 iterations; MH's typically requires many more iterations.
- Drag ε to 0.30 (too big). Run 100 HMC iter. Acceptance rate drops dramatically — leapfrog error has blown up. Drag ε back to 0.01 (too small) and rerun: acceptance is high (~99%) but the trajectory barely moves.
- Drag L to 5. Run 100. Trajectories are short. Performance is closer to random-walk. Drag L to 80. Trajectories may loop back (U-turn waste). The Hoffman-Gelman NUTS algorithm picks an L for you per iteration that avoids both pathologies.
A Stan model reports 50 divergent transitions out of 4000 post-warmup iterations. The user's first response should be:
What you now know
HMC reformulates Bayesian sampling as Hamiltonian dynamics: position θ + momentum p, kinetic + potential energy, leapfrog integrator. Trajectories follow the geometry; Metropolis correction maintains exactness. NUTS (Hoffman-Gelman 2014) auto-tunes ε and L, making HMC the default sampler for nontrivial Bayesian models in Stan / PyMC / NumPyro. Divergent transitions and funnel geometry are the chief pathologies; non-centred parameterisation is the workhorse remedy. §§7.6 (posterior predictive checks) and 7.7 (Bayes factors / WAIC) develop the diagnostic and comparison tools that wrap modern Bayesian inference.
References
- Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D. (1987). "Hybrid Monte Carlo." Physics Letters B 195(2), 216–222. (The original physics paper.)
- Neal, R.M. (2011). "MCMC using Hamiltonian dynamics." In Handbook of Markov Chain Monte Carlo, Brooks et al. (eds.), pp. 113–162. CRC. (The standard reference; brought HMC to statistics.)
- Hoffman, M.D., Gelman, A. (2014). "The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo." JMLR 15(1), 1593–1623. (NUTS.)
- Betancourt, M. (2017). "A conceptual introduction to Hamiltonian Monte Carlo." arXiv:1701.02434. (The clearest pedagogical treatment.)
- Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A. (2017). "Stan: A probabilistic programming language." J. Statistical Software 76(1). (Stan, the HMC workhorse.)