Linear ODEs as state-space systems
Continuous-time linear systems, the matrix exponential as fundamental solution, the damped oscillator as canonical example, and the eigenvalue classification of phase portraits.
On this page
- 1.1 What is a state-space system?
- Why this object
- 1.2 The matrix exponential and fundamental solutions
- Eigenvalue structure determines everything
- 1.3 The damped harmonic oscillator
- 1.4 Coupled systems and the Jacobian
- The Jacobian beyond linear systems
- 1.5 Phase portraits and the eigenvalue classification
- 1.6 A preview of frequency response
- 1.7 What’s next
- 1.8 Exercises
- Exercise 1.1 (computation)
- Exercise 1.2 (computation)
- Exercise 1.3 (computation)
- Exercise 1.4 (computation)
- Exercise 1.5 (theory) — solution in §1.9
- Exercise 1.6 (theory) — solution in §1.9
- Exercise 1.7 (theory) — solution in §1.9
- 1.9 Full solutions to theory exercises
- Solution to Exercise 1.5
- Solution to Exercise 1.6
- Solution to Exercise 1.7
- 1.10 Companion code
Linear ODEs as state-space systems
1.1 What is a state-space system?
A linear state-space system is the pair of equations
where is the state at time , is the input signal, is the output, and the four matrices are real-valued with shapes , , , . The first equation is the state equation — a first-order linear ODE describing how the state evolves. The second is the output equation — a static linear readout. The integer is the state dimension; it controls how much history the model can carry forward.
The pair is “linear” because both equations are linear in and separately. It is “state-space” because summarizes everything the system needs from the past in order to predict the future given future inputs — a property called the Markov property of state. Given the present state and the input from now onward, the system’s entire future is determined.
A few conventions before going further. We treat as a column vector and apply on the left: , not . We absorb the feedthrough term into the output equation only — it appears nowhere in the state dynamics — and we will often set in examples to keep notation light. When the input is absent (), the state equation reduces to
the homogeneous linear ODE, whose solution is the focus of §1.2.
Why this object
Three reasons the state-space form is the right starting point for a book on sequence-model architectures:
- It separates dynamics from input. Whatever the input signal looks like, the dynamics matrix alone determines the system’s long-run behavior — its stability, oscillation frequencies, and decay rates. Most of the analysis in Chapter 2 looks only at .
- It generalizes immediately. Replacing the scalar derivative with a finite difference yields a discrete recurrence (Chapter 4); replacing with admits oscillatory modes (Chapter 6, Chapter 10); replacing the constant with an input-dependent yields the selective-SSM family (Chapter 9). Each generalization preserves the state-space skeleton.
- It’s the form in which continuous-time models are discretized. Every SSM derivation in the literature starts from a continuous state equation and applies a numerical integration scheme to get a discrete recurrence. If you don’t know the continuous object, the discrete one is opaque.
1.2 The matrix exponential and fundamental solutions
The homogeneous ODE with initial condition has a unique solution for every — and that solution is
where is the matrix exponential of , defined by the same series as the scalar exponential:
For any square matrix , the matrix exponential is
The series converges absolutely for every (the entrywise -norm of the partial sums is bounded by ), so is always well-defined.
The fundamental solution property — that solves the homogeneous ODE — is verified by differentiating the series term-by-term and using . The resulting derivative is , which matches . Existence and uniqueness for the inhomogeneous case () follow the variation of parameters formula
an integral that recurs (in discretized form) as the convolution kernel of every LTI SSM in Chapter 8.
Eigenvalue structure determines everything
The matrix exponential’s behavior is governed entirely by the eigenvalues of . If is diagonalizable, with , then
So acts on the eigenbasis as independent scalar exponentials. The real parts of control decay or growth: means ; means blow-up. The imaginary parts control oscillation frequency. Three regimes recur throughout the book:
- All eigenvalues with negative real part — the system is asymptotically stable: every trajectory as . This is the regime in which a recurrence “forgets” its initial state, and it is the design target for the matrices of S4, Mamba, and friends.
- At least one eigenvalue with positive real part — the system is unstable: some initial conditions blow up. A trained SSM whose drifts into this regime is the dominant numerical failure mode (Chapter 5).
- Purely imaginary eigenvalues — the system is marginally stable and oscillates without decay. These are the energy-preserving modes of Hamiltonian systems and are the natural home of symplectic discretization (Chapter 6).
When is not diagonalizable — when it has repeated eigenvalues with deficient eigenspaces — the matrix exponential picks up polynomial-in- factors on top of the terms. The Jordan normal form (Chapter 3, §3.1) gives the precise statement. In practice almost every arising from random initialization or from physical systems is diagonalizable, and the polynomial-factor case shows up only at codimension-one boundaries in parameter space.
1.3 The damped harmonic oscillator
The canonical small example. A unit mass on a spring with stiffness and damping coefficient , displacement , evolves according to Newton’s second law,
This is a second-order scalar ODE, not first-order, so it does not yet fit the state-space template. We lift it by treating velocity as an extra state coordinate: let . Then with
This lifting trick — introduce auxiliary states to reduce ODE order to one — is the same procedure by which every higher-order ODE becomes a first-order state-space system, and it foreshadows how implicit higher-order schemes (Chapter 6) work internally.
The eigenvalues of are the roots of the characteristic polynomial :
Three sub-cases follow from the sign of the discriminant :
- Overdamped (): two real negative eigenvalues. Both modes decay monotonically; the displacement approaches zero without oscillation.
- Critically damped (): a repeated negative eigenvalue . The matrix is in this case not diagonalizable, and the solution contains a term.
- Underdamped (): a complex-conjugate eigenpair , with . Trajectories spiral toward the origin: oscillation at angular frequency with envelope decaying as .
The total energy is a useful diagnostic. Its time derivative is , i.e. energy decreases monotonically whenever . Both stationary and oscillating trajectories satisfy this. The companion damped_oscillator.py simulates the underdamped regime and plots alongside the trajectory; the energy curve gives an at-a-glance check that the numerical integration preserves the dissipation structure of the continuous system.
1.4 Coupled systems and the Jacobian
The damped oscillator is a 2-dimensional toy. Real systems — and trained SSMs — live in much higher dimensions, and they typically arise from coupling many simpler subsystems. The bookkeeping is straightforward once you accept that the state vector stacks all subsystem states and encodes both intra-subsystem dynamics and inter-subsystem coupling.
A useful larger example is a ring of coupled oscillators: identical damped oscillators arranged on a circle, each coupled to its two nearest neighbors by springs of stiffness . The displacements and velocities stack into a state vector . The matrix has a block-diagonal structure for the per-oscillator dynamics, plus an off-diagonal coupling pattern that links each oscillator’s velocity equation to its neighbors’ positions:
with indices taken mod . The bracketed term is the discrete Laplacian of around the ring; it’s the same discretization of you would use in a finite-difference method for the wave equation.
The Jacobian matrix — which for a linear system is just — has eigenvalues that are easy to compute analytically because of the ring’s circulant structure. They come in complex-conjugate pairs
(for the underdamped regime). Each indexes a standing wave mode on the ring with wavenumber ; the mode is uniform translation (no spatial structure), (when is even) is the maximally-oscillating mode where adjacent oscillators move in antiphase. The companion coupled_oscillators.py constructs for and visualizes the eigenvalues in the complex plane.
The key qualitative lesson: once you have the right state-space lift, even a system with rich spatial structure reduces to “compute eigenvalues of , classify by real and imaginary parts.” The eigenvalue distribution of a trained SSM’s — pre-training, mid-training, and converged — is one of the most informative diagnostics you can compute (Chapter 2, §2.2; Chapter 5).
The Jacobian beyond linear systems
For a nonlinear system , the Jacobian is the matrix of partial derivatives at . The linearization around a fixed point (where ) describes how small perturbations evolve. The eigenvalues of then determine local stability of by the same classification as in §1.2.
Trained recurrent networks — including SSMs in inference mode — are formally nonlinear (the state update is state = f(state, input) where f includes the input-dependent matrices). The linearization-around-a-state gives a local state-space system at each time step, and tracking how that local varies along a trajectory is the foundation of the Lyapunov-exponent analysis in Chapter 2.
1.5 Phase portraits and the eigenvalue classification
A phase portrait is the geometric picture of all trajectories of an autonomous system in state space, drawn as oriented curves. For 2-dimensional linear systems on , the topology of the portrait is determined entirely by the trace and the determinant — equivalently, by the eigenvalues — and the standard classification gives six qualitative behaviors:
| Region in -plane | Eigenvalues | Phase portrait |
|---|---|---|
| , , | Real negative | Stable node |
| , , | Real positive | Unstable node |
| , , | Complex conjugate, | Stable spiral |
| , , | Complex conjugate, | Unstable spiral |
| , | Pure imaginary | Center (closed orbits) |
| Real, opposite signs | Saddle |
The damped oscillator’s three sub-cases (over/critical/under-damped) realize three of these: stable node, stable degenerate node, stable spiral. The undamped limit slides the eigenvalues onto the imaginary axis, turning the spiral into a center; this is the Hamiltonian limit where energy is conserved exactly.
For higher-dimensional systems the topology is richer (think: 3-D systems can have spirals around line-shaped invariant manifolds), but the building blocks are the same: each pair of complex-conjugate eigenvalues contributes a 2-D spiral subspace, each real eigenvalue contributes a 1-D direction of decay or growth. This decomposition by invariant subspaces — equivalently, by the Jordan blocks of — is the geometric content of the matrix exponential’s eigenvalue structure (§1.2).
1.6 A preview of frequency response
The variation-of-parameters formula
expresses the state as a convolution of the input with the kernel . The output (taking ) is therefore
The function for (and zero otherwise) is the impulse response of the system: the output you would observe if the input were a Dirac delta .
Taking the Laplace transform turns convolution into multiplication. With and similarly for , you get
where is the transfer function. The eigenvalues of — already shown to govern — appear as the poles of , since blows up exactly when equals an eigenvalue of . The poles’ location in the complex -plane mirrors the eigenvalue classification of §1.5: left-half-plane poles ⇔ stable system; right-half-plane poles ⇔ unstable; imaginary-axis poles ⇔ marginally stable / oscillatory.
This connection is the bridge to Chapter 8, where you’ll see that the S4 layer’s convolutional view is literally a numerical evaluation of the impulse response at uniformly spaced time points, and to Chapter 11, where the spectral structure of long convolution kernels (Hyena, RetNet) is analyzed via the same transfer-function framing.
1.7 What’s next
The next two chapters develop the analytical machinery that this chapter has only sketched. Chapter 2 makes the eigenvalue-stability story rigorous (Lyapunov, A-stability, BIBO) and adds the QR-based computational method for tracking eigenvalue trajectories during training. Chapter 3 introduces the structured-matrix vocabulary (Toeplitz, semiseparable, Vandermonde, Cauchy) needed to discuss the kernel constructions of S4 and friends. Chapters 4–6 then take the continuous state-space system here and discretize it — first crudely (ZOH, bilinear), then with full numerical-analysis machinery (Butcher tableau, A-stability), then via implicit and structure-preserving methods (Gauss–Legendre, symplectic integrators), which is where the active research pilot’s anchor (the C1 pilot) lives.
You can also profitably jump ahead: Chapter 8’s LTI SSM presentation reuses everything in this chapter and is self-contained for readers who want to see the SSM application before working through the full math foundation.
1.8 Exercises
Seven problems mixing computation and theory. Short/numerical exercises (1–4) have inline collapsible solutions; long/proof exercises (5–7) have full worked solutions in §1.9.
Exercise 1.1 (computation)
Compute the matrix exponential of by hand, term-by-term, and verify it equals the rotation matrix .
Solution
Compute powers: , so , , and the pattern repeats with period 4. Splitting the series by parity:
which expands to . ∎
Exercise 1.2 (computation)
For the damped oscillator with and , classify the damping regime and find the eigenvalues of .
Solution
Discriminant: , so the system is underdamped. Eigenvalues: . The trajectory spirals toward the origin with envelope decay rate and angular frequency .
Exercise 1.3 (computation)
Lift the third-order ODE into a 3-dimensional state-space system . Write and explicitly.
Solution
Let . Then , and from the ODE , so
The bottom row encodes the ODE’s coefficient signature; the upper-triangular ones implement the bookkeeping . This is the companion form of the ODE.
Exercise 1.4 (computation)
For the ring of identical damped oscillators with , , (no damping, symmetric coupling), use the formula in §1.4 to find the eigenvalues of analytically. Confirm they lie on the imaginary axis (since ) and verify by running companions/ch01/jax/coupled_oscillators.py with .
Solution
With , for . Numerically:
- :
- :
- :
- : same as by symmetry ⇒
All 8 eigenvalues are pure imaginary (centers), as expected for an undamped conservative system. The numerical verification reproduces these (modulo double-precision noise).
Exercise 1.5 (theory) — solution in §1.9
Prove that the matrix-exponential series converges absolutely (entrywise) for every square matrix , with bound where is the Frobenius norm.
Exercise 1.6 (theory) — solution in §1.9
Show that if is skew-symmetric (), then is orthogonal for every (that is, ). Use this to give a geometric reason why the rotation matrix of Exercise 1.1 is orthogonal.
Exercise 1.7 (theory) — solution in §1.9
Prove the variation of parameters formula: the unique solution of with is
1.9 Full solutions to theory exercises
Solution to Exercise 1.5
Let denote the Frobenius norm, which satisfies the sub-multiplicative property . Then by induction. Therefore
The partial sums are entrywise bounded by terms of a convergent scalar series, so each matrix entry is Cauchy in and converges. The limit is what we call . The bound follows by passing to the limit. ∎
Solution to Exercise 1.6
We have and . Let . Then
since by skew-symmetry. So for all , i.e. is orthogonal.
Geometric interpretation: the rotation matrix of Exercise 1.1 was generated by , which is skew-symmetric. So is orthogonal — which we already knew, but this derivation pins down why: orthogonality follows structurally from skew-symmetry of the generator, not from any property of the cos/sin functions. ∎
Solution to Exercise 1.7
Define by the formula on the right-hand side. We verify (a) the initial condition and (b) the ODE.
Initial condition. At the integral vanishes (zero-length interval) and , so . ✓
ODE. Differentiate using the product rule on the integral (Leibniz’s rule for differentiating under the integral sign):
Uniqueness follows from the classical Picard–Lindelöf theorem applied to the right-hand side , which is Lipschitz in uniformly in with constant (see Hairer–Nørsett–Wanner Hairer et al. (1993) for the standard proof). ∎
1.10 Companion code
Three runnable companions in companions/ch01/jax/:
damped_oscillator.py— simulates the underdamped oscillator, plots energy decay (Figure 1.1)coupled_oscillators.py— constructs the ring-Laplacian state matrix, plots eigenvalue spectrum (Figure 1.2)matrix_exponential.py— comparesscipy.linalg.expmagainst truncated series sums; the truncated series catastrophically diverges for matrices with large spectral radius (auxiliary, used by Exercise 1.5)
All companions import their plotting style from companions/_shared/plot_utils.py. To run from the repo root:
PYTHONPATH=. python companions/ch01/jax/damped_oscillator.py
PYTHONPATH=. python companions/ch01/jax/coupled_oscillators.py
PYTHONPATH=. python companions/ch01/jax/matrix_exponential.py
Figures are written to public/figures/ch01/ and referenced from the <Figure> components above.