Implicit methods, stiff systems, symplectic integration
When explicit methods fail — backward Euler, BDF, DIRK as A-stable alternatives; exponential integrators as the family of choice for selective SSMs; Hamiltonian mechanics and symplectic integration as the natural geometric language for hidden-state dynamics.
On this page
- 6.1 Why explicit methods fail: stiffness
- 6.2 Implicit methods: backward Euler, BDF, DIRK
- 6.3 Exponential integrators revisited
- 6.4 Hamiltonian mechanics, energy conservation, and the symplectic structure
- 6.5 Symplectic integrators
- 6.6 Toward selective SSMs as Hamiltonian flows
- 6.7 What’s next
- 6.8 Exercises
- Exercise 6.1 (computation)
- Exercise 6.2 (computation)
- Exercise 6.3 (computation + code)
- Exercise 6.4 (theory) — solution in §6.9
- Exercise 6.5 (theory) — solution in §6.9
- Exercise 6.6 (theory) — solution in §6.9
- 6.9 Full solutions to theory exercises
- Solution to Exercise 6.4
- Solution to Exercise 6.5
- Solution to Exercise 6.6
- 6.10 Companion code
Implicit methods, stiff systems, symplectic integration
6.1 Why explicit methods fail: stiffness
Chapter 5 ended with the Dahlquist barrier: no explicit Runge–Kutta method is A-stable. The barrier matters because of a phenomenon called stiffness — when the dynamics matrix has eigenvalues with very different magnitudes.
A linear ODE is stiff if has at least one eigenvalue with for a time-scale much smaller than the simulation horizon, and the eigenvalue is paired with a long-lived (small-) mode that one actually wants to track. Stiffness is therefore relative: a system is stiff with respect to a given horizon and given accuracy goal, not in absolute terms. The Robertson chemical-kinetics problem (‘s spanning ten orders of magnitude) is the canonical textbook example; van der Pol at parameter is a popular ML-adjacent stand-in because it generates a nonlinear oscillation whose slow and fast phases differ by .
For an explicit method, the step size is bounded by the fastest eigenvalue: (forward Euler) or some other small constant times (RK4). If the slow eigenvalue we want to track is , the integrator takes times more steps than the slow time scale would suggest. For Robertson this ratio is . The simulation never finishes.
For an SSM, stiffness arises naturally for input-dependent dynamics. Mamba’s selective scan generates an effective at each token; for inputs that produce eigenvalues with very different magnitudes, the time-discretized recurrence is solving a stiff problem at each step. If the discretization is not A-stable, the recurrence either blows up or requires a step size — at which point you are running the model in an absurd regime. This is the structural reason Mamba-3 switched to exponential-trapezoidal (§4.5), which is A-stable in but explicit in .
6.2 Implicit methods: backward Euler, BDF, DIRK
The fix for stiffness is to abandon explicit methods. Backward Euler is the simplest implicit method:
Solving for requires either a linear solve (for linear ) or a Newton iteration (for nonlinear ). The pay-off is the stability function
which has for every with — backward Euler is A-stable. It is also L-stable: as , so fast modes get damped aggressively as desired on stiff problems.
(Backward Euler is L-stable.) Backward Euler’s stability function satisfies for every with , and as .
The proof is one line each: when , giving . The limit is immediate.
Backward Euler is first-order accurate. Higher-order implicit methods come in several families.
BDF (Backward Differentiation Formula, th-order) is a multi-step method that approximates by a -point backward difference and equates it to . BDF1 = backward Euler; BDF2 is second-order and L-stable; BDF for are conditionally stable but widely used in stiff ODE codes. BDF7+ is unstable. The order limit is the second Dahlquist barrier, a structural fact about multi-step methods that we will state without proof: no -stable linear multi-step method has order higher than 2.
Diagonally implicit RK (DIRK) sits between explicit RK and fully implicit RK. The matrix is lower triangular with non-zero diagonal, so each stage requires solving an independent nonlinear system (cheaper than the coupled system of fully implicit RK, but more expensive than the trivial explicit evaluations). Singly-DIRK (SDIRK) further constrains all diagonal entries to be equal, so the Newton Jacobian factorizes once per step. The Crank–Nicolson scheme is a 2-stage SDIRK of order 2.
Fully implicit RK — the most expensive family — includes Gauss–Legendre methods, which we will return to in §6.5 as the optimal symplectic methods.
The computational trade-off for implicit methods: each step costs (linear solve for the Jacobian factor) or (Newton iteration for nonlinear ). For SSMs with this is acceptable; for in the thousands it begins to dominate. The Chapter 4 exponential-trapezoidal scheme has the same A-stability as backward Euler but only costs one matrix-exponential per timestep (vs one Newton solve), and the matrix exponential of a low-rank or diagonalized is cheap. Mamba-3’s specific tactical advantage over a backward-Euler implementation comes down to this cost asymmetry.
6.3 Exponential integrators revisited
The exp-trapezoidal scheme of §4.5 is the simplest member of the exponential integrator family. Recall the general idea: given , treat the linear part exactly via and approximate only the forcing integral. The -function family provides building blocks for arbitrarily high order.
Three commonly used schemes:
- exp-Euler: . Treats the input as piecewise constant (like ZOH); first-order for forced systems, exact on autonomous problems.
- exp-midpoint: uses the input midpoint instead of ; second-order for symmetric input perturbations.
- exp-trapezoidal (§4.5): linear interpolation of the input; second-order for inputs.
- ETDRK4 Hochbruck & Ostermann (2010) : a four-stage exponential RK method that achieves fourth order. The Mamba-3 paper does not (yet) use ETDRK4; its second-order exp-trapezoidal is the empirical sweet spot.
All exponential integrators inherit A-stability from . They sidestep the Dahlquist barrier because their stability function is not a polynomial — it is times a rational-in- correction factor — and the polynomial constraint of explicit RK methods does not apply.
The implementation cost: one matrix exponential plus one or two -function evaluations per step. For a low-rank (which selective SSMs typically have, since the structured- pattern of Chapter 7 carries forward), the matrix exponential is cheap. For dense general exponentials cost via Padé / scaling-and-squaring, the same as a Newton solve — so the choice between exp-trap and backward Euler often comes down to whether you want autonomous-exactness (exp-trap wins) or L-stability damping of high-frequency modes (backward Euler wins).
6.4 Hamiltonian mechanics, energy conservation, and the symplectic structure
The remainder of this chapter introduces the language of Hamiltonian systems and symplectic integration. This is where the direct-transfer hook from continuum mechanics applies most cleanly. The exposition is brief and standalone; for the full theory see Hairer–Lubich–Wanner Hairer & Wanner (1996) Volume II Chapter VI.
A Hamiltonian system on phase space is governed by a scalar function — the Hamiltonian, intended to represent total energy — through Hamilton’s equations:
The canonical example is the harmonic oscillator , giving , . The pendulum is ; vortex-dynamics flows are higher-dimensional but structurally identical.
Three properties make Hamiltonian systems geometrically special.
Energy conservation. Along any trajectory , . The Hamiltonian is a constant of motion.
Phase-space volume preservation (Liouville’s theorem). The flow has Jacobian determinant 1 everywhere: . Volumes in phase space are preserved exactly under the dynamics.
Symplectic structure. Define the symplectic 2-form . The flow is a symplectic transformation: it preserves in the sense that where denotes the pullback. Symplecticity is a strict refinement of volume preservation — every symplectic transformation is volume-preserving, but not vice versa.
The reason these properties matter for numerics is that standard order-optimized integrators destroy them. A fourth-order RK method simulates the harmonic oscillator with local error per step — but it does not exactly preserve energy. Over steps, the energy error accumulates as per period times the number of periods — leading to linear-in-time energy drift that eventually destroys the trajectory’s qualitative behavior. RK4 on a pendulum will, over 1000 periods, gradually inflate the orbit and unbound the system. RK4 on a planetary orbit will (eventually) eject the planet. These are not numerical instabilities in the Chapter 5 sense — the method is A-stable for the small-amplitude regime — they are geometric errors: the wrong qualitative dynamics.
6.5 Symplectic integrators
A symplectic integrator is a numerical map that preserves the symplectic 2-form: . Symplectic integrators do not, in general, preserve exactly — but they preserve a nearby modified Hamiltonian for a -th order symplectic method. As a consequence, the energy along the discrete trajectory oscillates within a bounded band of width around its initial value — not drifting linearly with , but staying near a level set of . This is the backward error analysis of geometric integrators, and it is the practical reason symplectic methods are mandatory for long-horizon Hamiltonian simulations.
Three canonical schemes.
Symplectic Euler (1st order). For a separable Hamiltonian :
Update momentum first using position-derived force; then update position using the updated momentum. The asymmetry — using in the second step rather than — is what makes the scheme symplectic. A symmetric version updates first, then ; both are valid.
Störmer–Verlet (2nd order, symmetric). A symmetric version of symplectic Euler that does a half-step in momentum, full step in position, half-step in momentum:
This is the algorithm used in essentially every molecular-dynamics simulation. Second-order accurate, symplectic, time-reversible (the same algorithm run with exactly reverses the trajectory). The energy error is bounded by over arbitrarily long horizons.
Gauss–Legendre IRK (order , A-stable, symplectic). The -stage Gauss–Legendre Runge–Kutta method has nodes at the Gauss–Legendre quadrature points and is the unique -stage RK method of order . The 1-stage method is the implicit midpoint rule (order 2, symplectic, A-stable). The 2-stage method is order 4. Gauss–Legendre IRK is the unique family that is simultaneously A-stable, symplectic, and of optimal order — making it the gold standard for high-accuracy long-horizon Hamiltonian simulation.
(Modified Hamiltonian for symplectic methods.) For a symplectic integrator of order applied to a Hamiltonian system, there exists a modified Hamiltonian such that the discrete trajectory exactly preserves to within exponentially small terms over exponentially long times. The original therefore oscillates within a bounded band along the discrete trajectory.
The proof — Hairer–Lubich–Wanner Chapter IX — uses backward error analysis to expand the modified Hamiltonian as an asymptotic series in and shows the series can be truncated with controlled remainder. The takeaway for practice: a -th order symplectic integrator is exponentially better than a -th order non-symplectic one on Hamiltonian problems, even though the local truncation error of both is . Local error is misleading for long-horizon problems; geometric structure is what matters.
6.6 Toward selective SSMs as Hamiltonian flows
The C1 niche pilot proposes the following research program. The hidden-state dynamics of a selective SSM can sometimes be written as a Hamiltonian flow on the hidden-state manifold; when they can, applying a symplectic integrator should reduce long-horizon error in a way that order-optimized integrators cannot match. The first step is to characterize when this Hamiltonian structure exists.
For a linear-time-invariant SSM , the dynamics are Hamiltonian iff there exists a symmetric positive-definite such that . Equivalently, is similar (via ) to a skew-symmetric matrix. The quadratic Hamiltonian is then , and the symplectic 2-form is (in the standard symplectic-flat coordinates). For HiPPO-LegS the eigenvalues are all real-negative, not purely imaginary, so HiPPO-LegS dynamics are dissipative, not Hamiltonian. Symplectic methods do not apply. This is consistent with the empirical observation that ZOH and bilinear discretizations of HiPPO-LegS perform well — dissipation handles the long-horizon stability story.
For a selective SSM with input-dependent , the relevant question becomes: does have a Hamiltonian decomposition for some range of inputs, even if not for all? The C1 pilot’s empirical hypothesis is that during certain training phases the learned matrices drift toward eigenvalue spectra that are closer to purely imaginary than to the LHP, and that this drift is associated with the long-horizon recall artifacts observed in Mamba’s failure modes Anonymous (2025) . If this hypothesis holds, then symplectic discretization of selective SSMs should reduce those artifacts. The Chapter 17 pilot integration develops this thread; this chapter has supplied the necessary integration-theory vocabulary.
6.7 What’s next
This chapter completes the foundations layer of the book (Part I). The next part — Chapters 7–10 — introduces structured SSMs (S4, S4D, S5, Mamba-1/2, Mamba-3) as discretizations of the continuous systems developed here. Chapter 7 sets up HiPPO theory; Chapter 8 derives the S4 / S4D / S5 architectures and shows where bilinear discretization shines; Chapter 9 introduces selective scans (Mamba-1/2); Chapter 10 brings the exp-trapezoidal scheme of §4.5 to its home in Mamba-3.
The C1 pilot integration in Chapter 17 returns to this chapter’s symplectic story, applying the modified-Hamiltonian framework to learned selective dynamics. Readers focused on the SSM applications can skip to Chapter 7 now and consult §6.5–§6.6 when the symplectic discussion comes back.
6.8 Exercises
Six problems. Short/numerical (6.1–6.3) have inline collapsible solutions; long/proof exercises (6.4–6.6) have full worked solutions in §6.9.
Exercise 6.1 (computation)
Verify that backward Euler’s stability function is by directly applying the scheme to the test problem and solving for .
Solution
Backward Euler: . Rearranging, , so . Setting , the stability function is . ∎
Exercise 6.2 (computation)
For the harmonic oscillator , (so ), compute one step of symplectic Euler (“p first” variant) starting from at . What are , and what is ?
Solution
Symplectic Euler ( first): , then .
Starting from : . Then . So .
Energy: . Initial energy was . The energy is not exactly preserved (it changed by ), but it is bounded — over many steps it will oscillate, not drift.
Exercise 6.3 (computation + code)
Run companions/ch06/jax/symplectic_demo.py. Verify that classical RK4 produces linear-in-time energy drift on the harmonic oscillator while Störmer–Verlet’s energy stays in a bounded band. What is the drift rate (energy per period) of RK4 at ?
Solution
The companion’s main output shows two energy-vs-time traces over 100 periods. RK4 drifts approximately linearly; the drift rate is roughly per period at — consistent with local error compounded over steps per period. Störmer–Verlet’s energy stays within a band of width , with no secular drift. The contrast — linear drift vs bounded oscillation — is the key takeaway of §6.5.
Exercise 6.4 (theory) — solution in §6.9
Prove the L-stability of backward Euler (Theorem 6.1): for , and as .
Exercise 6.5 (theory) — solution in §6.9
Show that symplectic Euler is symplectic. That is, prove that the Jacobian has determinant exactly 1 for the harmonic oscillator. (Then note: for separable Hamiltonians, the same argument applies via the chain rule.)
Exercise 6.6 (theory) — solution in §6.9
Show that for a linear-time-invariant ODE with skew-symmetric (), the function is a constant of motion. Use this to give a sufficient condition (in terms of only) for an SSM to be Hamiltonian.
6.9 Full solutions to theory exercises
Solution to Exercise 6.4
Write with . Then . Since , , so . Therefore . ∎
For the L-stability limit: as with bounded, , so . ∎
Solution to Exercise 6.5
For the harmonic oscillator , , symplectic Euler ( first) gives
The Jacobian is
The determinant: . ∎
So the discrete map preserves the symplectic 2-form exactly. This is the algebraic content of “symplectic Euler is symplectic” — the determinant being exactly , not .
For a general separable Hamiltonian , symplectic Euler gives and . The Jacobian factorizes as
each of which has determinant 1 (lower- and upper-triangular with 1s on the diagonal). The product determinant is therefore 1. ∎
This “shear-decomposition” argument generalizes to all symplectic integrators that update and in alternating steps: each individual sub-step is a shear, with unit determinant. Verlet, leapfrog, Yoshida composition methods — all are products of unit-determinant shears.
Solution to Exercise 6.6
For with , compute
since by skew-symmetry. So — the squared norm is exactly preserved. ∎
Sufficient condition for an SSM to be Hamiltonian. Any SSM whose continuous dynamics matrix is skew-symmetric is Hamiltonian with Hamiltonian . The eigenvalues of a real skew-symmetric matrix are purely imaginary (or zero), so this regime is exactly the purely oscillatory corner of the SSM landscape — the opposite of the LHP-stable corner where HiPPO and S4 live. A more general sufficient condition: has a Hamiltonian structure (in the matrix-Lie-algebra sense) — equivalently, there exists a symmetric positive-definite with . The C1 pilot’s empirical project is to identify selective-SSM regimes where this condition holds approximately, and to apply symplectic discretization there.
6.10 Companion code
Two JAX companions and two Julia companions for Chapter 6.
JAX (companions/ch06/jax/):
stiff_demo.py— van der Pol oscillator at (mildly stiff); compares explicit RK4 (blows up at large ) against backward Euler (stable at every ); emitsstiff_blowup.png.symplectic_demo.py— harmonic oscillator + pendulum; runs classical RK4 vs Störmer–Verlet over 100 periods; emitsenergy_drift.pngandphase_portrait.png.
Julia (companions/ch06/julia/):
implicit_methods.jl— backward Euler, BDF2, and DIRK from-scratch implementations on a stiff test problem; uses onlyLinearAlgebra+Printf.symplectic_methods.jl— symplectic Euler, Störmer–Verlet, and the 2-stage Gauss–Legendre IRK method on a Hamiltonian test; emits empirical order-of-accuracy table.Project.toml/Manifest.toml.
To run from the repo root:
# JAX
PYTHONPATH=. python companions/ch06/jax/stiff_demo.py
PYTHONPATH=. python companions/ch06/jax/symplectic_demo.py
# Julia (lightweight — no DifferentialEquations.jl dependency)
julia --project=companions/ch06/julia companions/ch06/julia/implicit_methods.jl
julia --project=companions/ch06/julia companions/ch06/julia/symplectic_methods.jl
All figures emit to public/figures/ch06/.