Part I · Foundations Week 1 Published

Linear ODEs as state-space systems

Continuous-time linear systems, the matrix exponential as fundamental solution, the damped oscillator as canonical example, and the eigenvalue classification of phase portraits.

On this page
  1. 1.1 What is a state-space system?
  2. Why this object
  3. 1.2 The matrix exponential and fundamental solutions
  4. Eigenvalue structure determines everything
  5. 1.3 The damped harmonic oscillator
  6. 1.4 Coupled systems and the Jacobian
  7. The Jacobian beyond linear systems
  8. 1.5 Phase portraits and the eigenvalue classification
  9. 1.6 A preview of frequency response
  10. 1.7 What’s next
  11. 1.8 Exercises
  12. Exercise 1.1 (computation)
  13. Exercise 1.2 (computation)
  14. Exercise 1.3 (computation)
  15. Exercise 1.4 (computation)
  16. Exercise 1.5 (theory) — solution in §1.9
  17. Exercise 1.6 (theory) — solution in §1.9
  18. Exercise 1.7 (theory) — solution in §1.9
  19. 1.9 Full solutions to theory exercises
  20. Solution to Exercise 1.5
  21. Solution to Exercise 1.6
  22. Solution to Exercise 1.7
  23. 1.10 Companion code

Linear ODEs as state-space systems

1.1 What is a state-space system?

A linear state-space system is the pair of equations

ddth(t)=Ah(t)+Bu(t),y(t)=Ch(t)+Du(t),\ddt \statevec(t) = \statemat \statevec(t) + \inputmat u(t), \qquad y(t) = \outputmat \statevec(t) + \feedmat u(t),

where h(t)RN\statevec(t) \in \R^N is the state at time t0t \ge 0, u(t)RDu(t) \in \R^D is the input signal, y(t)RMy(t) \in \R^M is the output, and the four matrices (A,B,C,D)(\statemat, \inputmat, \outputmat, \feedmat) are real-valued with shapes N×NN \times N, N×DN \times D, M×NM \times N, M×DM \times D. The first equation is the state equation — a first-order linear ODE describing how the state evolves. The second is the output equation — a static linear readout. The integer NN is the state dimension; it controls how much history the model can carry forward.

The pair is “linear” because both equations are linear in h\statevec and uu separately. It is “state-space” because h(t)\statevec(t) summarizes everything the system needs from the past in order to predict the future given future inputs — a property called the Markov property of state. Given the present state and the input from now onward, the system’s entire future is determined.

A few conventions before going further. We treat h(t)\statevec(t) as a column vector and apply A\statemat on the left: Ah\statemat \statevec, not hA\statevec \statemat. We absorb the feedthrough term Du(t)\feedmat u(t) into the output equation only — it appears nowhere in the state dynamics — and we will often set D=0\feedmat = 0 in examples to keep notation light. When the input is absent (u0u \equiv 0), the state equation reduces to

ddth(t)=Ah(t),\ddt \statevec(t) = \statemat \statevec(t),

the homogeneous linear ODE, whose solution is the focus of §1.2.

Why this object

Three reasons the state-space form is the right starting point for a book on sequence-model architectures:

  1. It separates dynamics from input. Whatever the input signal looks like, the dynamics matrix A\statemat alone determines the system’s long-run behavior — its stability, oscillation frequencies, and decay rates. Most of the analysis in Chapter 2 looks only at A\statemat.
  2. It generalizes immediately. Replacing the scalar derivative ddt\ddt with a finite difference yields a discrete recurrence (Chapter 4); replacing R\R with C\C admits oscillatory modes (Chapter 6, Chapter 10); replacing the constant A\statemat with an input-dependent A(ut)\statemat(u_t) yields the selective-SSM family (Chapter 9). Each generalization preserves the state-space skeleton.
  3. It’s the form in which continuous-time models are discretized. Every SSM derivation in the literature starts from a continuous state equation and applies a numerical integration scheme to get a discrete recurrence. If you don’t know the continuous object, the discrete one is opaque.

1.2 The matrix exponential and fundamental solutions

The homogeneous ODE ddth=Ah\ddt \statevec = \statemat \statevec with initial condition h(0)=h0\statevec(0) = \statevec_0 has a unique solution for every h0RN\statevec_0 \in \R^N — and that solution is

h(t)=eAth0,\statevec(t) = e^{\statemat t} \statevec_0,

where eAte^{\statemat t} is the matrix exponential of At\statemat t, defined by the same series as the scalar exponential:

.

For any square matrix MRN×NM \in \R^{N \times N}, the matrix exponential is

eM:=k=0Mkk!=I+M+12M2+16M3+e^M := \sum_{k=0}^{\infty} \frac{M^k}{k!} = I + M + \tfrac{1}{2} M^2 + \tfrac{1}{6} M^3 + \cdots

The series converges absolutely for every MM (the entrywise 1\ell^1-norm of the partial sums is bounded by eMe^{\norm{M}}), so eMe^M is always well-defined.

The fundamental solution property — that teAth0t \mapsto e^{\statemat t} \statevec_0 solves the homogeneous ODE — is verified by differentiating the series term-by-term and using ddt(At)kk!=Aktk1(k1)!\frac{d}{dt} \frac{(\statemat t)^k}{k!} = \frac{\statemat^k t^{k-1}}{(k-1)!}. The resulting derivative is AeAth0\statemat \cdot e^{\statemat t} \statevec_0, which matches Ah(t)\statemat \statevec(t). Existence and uniqueness for the inhomogeneous case (u≢0u \not\equiv 0) follow the variation of parameters formula

h(t)=eAth0+0teA(ts)Bu(s)ds,\statevec(t) = e^{\statemat t} \statevec_0 + \int_0^t e^{\statemat (t-s)} \inputmat u(s) \, ds,

an integral that recurs (in discretized form) as the convolution kernel of every LTI SSM in Chapter 8.

Eigenvalue structure determines everything

The matrix exponential’s behavior is governed entirely by the eigenvalues of A\statemat. If A\statemat is diagonalizable, A=VΛV1\statemat = V \Lambda V^{-1} with Λ=diag(λ1,,λN)\Lambda = \diag(\lambda_1, \ldots, \lambda_N), then

eAt=VeΛtV1=Vdiag(eλ1t,,eλNt)V1.e^{\statemat t} = V \, e^{\Lambda t} \, V^{-1} = V \, \diag(e^{\lambda_1 t}, \ldots, e^{\lambda_N t}) \, V^{-1}.

So eAte^{\statemat t} acts on the eigenbasis as independent scalar exponentials. The real parts of λi\lambda_i control decay or growth: (λi)<0\Re(\lambda_i) < 0 means eλit0e^{\lambda_i t} \to 0; (λi)>0\Re(\lambda_i) > 0 means blow-up. The imaginary parts control oscillation frequency. Three regimes recur throughout the book:

  1. All eigenvalues with negative real part — the system is asymptotically stable: every trajectory h(t)0\statevec(t) \to 0 as tt \to \infty. This is the regime in which a recurrence “forgets” its initial state, and it is the design target for the A\statemat matrices of S4, Mamba, and friends.
  2. At least one eigenvalue with positive real part — the system is unstable: some initial conditions blow up. A trained SSM whose A\statemat drifts into this regime is the dominant numerical failure mode (Chapter 5).
  3. Purely imaginary eigenvalues — the system is marginally stable and oscillates without decay. These are the energy-preserving modes of Hamiltonian systems and are the natural home of symplectic discretization (Chapter 6).

When A\statemat is not diagonalizable — when it has repeated eigenvalues with deficient eigenspaces — the matrix exponential picks up polynomial-in-tt factors on top of the eλite^{\lambda_i t} terms. The Jordan normal form (Chapter 3, §3.1) gives the precise statement. In practice almost every A\statemat arising from random initialization or from physical systems is diagonalizable, and the polynomial-factor case shows up only at codimension-one boundaries in parameter space.

1.3 The damped harmonic oscillator

The canonical small example. A unit mass on a spring with stiffness k>0k > 0 and damping coefficient c>0c > 0, displacement q(t)q(t), evolves according to Newton’s second law,

q¨(t)+cq˙(t)+kq(t)=0.\ddot q(t) + c \, \dot q(t) + k \, q(t) = 0.

This is a second-order scalar ODE, not first-order, so it does not yet fit the state-space template. We lift it by treating velocity as an extra state coordinate: let h(t)=(q(t),q˙(t))R2\statevec(t) = (q(t), \dot q(t))^\top \in \R^2. Then ddth=Ah\ddt \statevec = \statemat \statevec with

A=(01kc).\statemat = \begin{pmatrix} 0 & 1 \\ -k & -c \end{pmatrix}.

This lifting trick — introduce auxiliary states to reduce ODE order to one — is the same procedure by which every higher-order ODE becomes a first-order state-space system, and it foreshadows how implicit higher-order schemes (Chapter 6) work internally.

The eigenvalues of A\statemat are the roots of the characteristic polynomial λ2+cλ+k=0\lambda^2 + c \lambda + k = 0:

λ±=c2±12c24k.\lambda_\pm = -\frac{c}{2} \pm \frac{1}{2}\sqrt{c^2 - 4k}.

Three sub-cases follow from the sign of the discriminant c24kc^2 - 4k:

  • Overdamped (c2>4kc^2 > 4k): two real negative eigenvalues. Both modes decay monotonically; the displacement approaches zero without oscillation.
  • Critically damped (c2=4kc^2 = 4k): a repeated negative eigenvalue λ=c/2\lambda = -c/2. The matrix A\statemat is in this case not diagonalizable, and the solution contains a teλtt \cdot e^{\lambda t} term.
  • Underdamped (c2<4kc^2 < 4k): a complex-conjugate eigenpair λ±=c/2±iω\lambda_\pm = -c/2 \pm i\omega, with ω=124kc2\omega = \tfrac{1}{2}\sqrt{4k - c^2}. Trajectories spiral toward the origin: oscillation at angular frequency ω\omega with envelope decaying as ect/2e^{-ct/2}.

The total energy E(t)=12(q˙2+kq2)E(t) = \tfrac{1}{2}(\dot q^2 + k q^2) is a useful diagnostic. Its time derivative is E˙=cq˙20\dot E = -c \dot q^2 \le 0, i.e. energy decreases monotonically whenever q˙0\dot q \ne 0. Both stationary and oscillating trajectories satisfy this. The companion damped_oscillator.py simulates the underdamped regime and plots E(t)E(t) alongside the trajectory; the energy curve gives an at-a-glance check that the numerical integration preserves the dissipation structure of the continuous system.

Energy decay of the damped harmonic oscillator over time, plotted on linear and log scales.
Energy E(t) of the underdamped oscillator (k=4, c=0.2) decaying exponentially in time. Left: linear scale, showing the envelope shape. Right: log scale, where the constant slope confirms the exponential decay rate of c/2 in the energy norm. Produced by companions/ch01/jax/damped_oscillator.py.

1.4 Coupled systems and the Jacobian

The damped oscillator is a 2-dimensional toy. Real systems — and trained SSMs — live in much higher dimensions, and they typically arise from coupling many simpler subsystems. The bookkeeping is straightforward once you accept that the state vector h\statevec stacks all subsystem states and A\statemat encodes both intra-subsystem dynamics and inter-subsystem coupling.

A useful larger example is a ring of coupled oscillators: nn identical damped oscillators arranged on a circle, each coupled to its two nearest neighbors by springs of stiffness κ\kappa. The displacements q1,,qnq_1, \ldots, q_n and velocities q˙1,,q˙n\dot q_1, \ldots, \dot q_n stack into a state vector hR2n\statevec \in \R^{2n}. The A\statemat matrix has a 2×22 \times 2 block-diagonal structure for the per-oscillator dynamics, plus an off-diagonal coupling pattern that links each oscillator’s velocity equation to its neighbors’ positions:

q¨i=kqicq˙i+κ(qi12qi+qi+1),i=1,,n,\ddot q_i = -k q_i - c \dot q_i + \kappa (q_{i-1} - 2 q_i + q_{i+1}), \qquad i = 1, \ldots, n,

with indices taken mod nn. The bracketed term qi12qi+qi+1q_{i-1} - 2 q_i + q_{i+1} is the discrete Laplacian of qq around the ring; it’s the same discretization of 2/x2\partial^2/\partial x^2 you would use in a finite-difference method for the wave equation.

The 2n×2n2n \times 2n Jacobian matrix J\jacobian — which for a linear system is just A\statemat — has eigenvalues that are easy to compute analytically because of the ring’s circulant structure. They come in complex-conjugate pairs

λ±(m)=c2±ik+2κ(1cos(2πm/n))c2/4,m=0,1,,n1,\lambda_\pm^{(m)} = -\frac{c}{2} \pm i \sqrt{k + 2\kappa(1 - \cos(2\pi m/n)) - c^2/4}, \qquad m = 0, 1, \ldots, n-1,

(for the underdamped regime). Each mm indexes a standing wave mode on the ring with wavenumber 2πm/n2\pi m / n; the m=0m=0 mode is uniform translation (no spatial structure), m=n/2m = n/2 (when nn is even) is the maximally-oscillating mode where adjacent oscillators move in antiphase. The companion coupled_oscillators.py constructs A\statemat for n=8n = 8 and visualizes the eigenvalues in the complex plane.

Eigenvalues of the coupled-oscillator Jacobian in the complex plane, showing two arcs corresponding to the underdamped modes.
Eigenvalues of the ring-of-oscillators Jacobian for n=8, k=4, c=0.2, κ=1. All 16 eigenvalues sit in the left half-plane (asymptotically stable) and arrange into two arcs corresponding to the two underdamped families. The arc curvature reflects the dispersion relation ω(m) of the spatial Laplacian. Produced by companions/ch01/jax/coupled_oscillators.py.

The key qualitative lesson: once you have the right state-space lift, even a system with rich spatial structure reduces to “compute eigenvalues of A\statemat, classify by real and imaginary parts.” The eigenvalue distribution of a trained SSM’s A\statemat — pre-training, mid-training, and converged — is one of the most informative diagnostics you can compute (Chapter 2, §2.2; Chapter 5).

The Jacobian beyond linear systems

For a nonlinear system ddth=f(h)\ddt \statevec = f(\statevec), the Jacobian J(h)=f/h\jacobian(\statevec) = \partial f / \partial \statevec is the matrix of partial derivatives at h\statevec. The linearization ddtδh=J(h)δh\ddt \delta\statevec = \jacobian(\statevec^*) \delta\statevec around a fixed point h\statevec^* (where f(h)=0f(\statevec^*) = 0) describes how small perturbations δh\delta\statevec evolve. The eigenvalues of J(h)\jacobian(\statevec^*) then determine local stability of h\statevec^* by the same classification as in §1.2.

Trained recurrent networks — including SSMs in inference mode — are formally nonlinear (the state update is state = f(state, input) where f includes the input-dependent matrices). The linearization-around-a-state gives a local state-space system at each time step, and tracking how that local A\statemat varies along a trajectory is the foundation of the Lyapunov-exponent analysis in Chapter 2.

1.5 Phase portraits and the eigenvalue classification

A phase portrait is the geometric picture of all trajectories of an autonomous system in state space, drawn as oriented curves. For 2-dimensional linear systems ddth=Ah\ddt \statevec = \statemat \statevec on R2\R^2, the topology of the portrait is determined entirely by the trace tr(A)\tr(\statemat) and the determinant det(A)\det(\statemat) — equivalently, by the eigenvalues — and the standard classification gives six qualitative behaviors:

Region in (tr,det)(\tr, \det)-planeEigenvaluesPhase portrait
det>0\det > 0, tr24det>0\tr^2 - 4\det > 0, tr<0\tr < 0Real negativeStable node
det>0\det > 0, tr24det>0\tr^2 - 4\det > 0, tr>0\tr > 0Real positiveUnstable node
det>0\det > 0, tr24det<0\tr^2 - 4\det < 0, tr<0\tr < 0Complex conjugate, <0\Re < 0Stable spiral
det>0\det > 0, tr24det<0\tr^2 - 4\det < 0, tr>0\tr > 0Complex conjugate, >0\Re > 0Unstable spiral
det>0\det > 0, tr=0\tr = 0Pure imaginaryCenter (closed orbits)
det<0\det < 0Real, opposite signsSaddle

The damped oscillator’s three sub-cases (over/critical/under-damped) realize three of these: stable node, stable degenerate node, stable spiral. The undamped limit c0c \to 0 slides the eigenvalues onto the imaginary axis, turning the spiral into a center; this is the Hamiltonian limit where energy is conserved exactly.

For higher-dimensional systems the topology is richer (think: 3-D systems can have spirals around line-shaped invariant manifolds), but the building blocks are the same: each pair of complex-conjugate eigenvalues contributes a 2-D spiral subspace, each real eigenvalue contributes a 1-D direction of decay or growth. This decomposition by invariant subspaces — equivalently, by the Jordan blocks of A\statemat — is the geometric content of the matrix exponential’s eigenvalue structure (§1.2).

1.6 A preview of frequency response

The variation-of-parameters formula

h(t)=eAth0+0teA(ts)Bu(s)ds\statevec(t) = e^{\statemat t} \statevec_0 + \int_0^t e^{\statemat (t-s)} \inputmat u(s) \, ds

expresses the state as a convolution of the input with the kernel K(t):=eAtBK(t) := e^{\statemat t} \inputmat. The output y(t)=Ch(t)y(t) = \outputmat \statevec(t) (taking D=0\feedmat = 0) is therefore

y(t)=CeAth0+0tCeA(ts)Bu(s)ds.y(t) = \outputmat e^{\statemat t} \statevec_0 + \int_0^t \outputmat e^{\statemat (t-s)} \inputmat \, u(s) \, ds.

The function h(t):=CeAtBh(t) := \outputmat e^{\statemat t} \inputmat for t0t \ge 0 (and zero otherwise) is the impulse response of the system: the output you would observe if the input were a Dirac delta u(t)=δ(t)u(t) = \delta(t).

Taking the Laplace transform turns convolution into multiplication. With y^(s):=0y(t)estdt\widehat y(s) := \int_0^\infty y(t) e^{-st} dt and similarly for u^\widehat u, you get

y^(s)=H(s)u^(s)+(initial-condition terms),\widehat y(s) = H(s) \, \widehat u(s) + \text{(initial-condition terms)},

where H(s):=C(sIA)1BH(s) := \outputmat (s I - \statemat)^{-1} \inputmat is the transfer function. The eigenvalues of A\statemat — already shown to govern eAte^{\statemat t} — appear as the poles of H(s)H(s), since (sIA)1(sI - \statemat)^{-1} blows up exactly when ss equals an eigenvalue of A\statemat. The poles’ location in the complex ss-plane mirrors the eigenvalue classification of §1.5: left-half-plane poles ⇔ stable system; right-half-plane poles ⇔ unstable; imaginary-axis poles ⇔ marginally stable / oscillatory.

This connection is the bridge to Chapter 8, where you’ll see that the S4 layer’s convolutional view is literally a numerical evaluation of the impulse response h(t)h(t) at uniformly spaced time points, and to Chapter 11, where the spectral structure of long convolution kernels (Hyena, RetNet) is analyzed via the same transfer-function framing.

1.7 What’s next

The next two chapters develop the analytical machinery that this chapter has only sketched. Chapter 2 makes the eigenvalue-stability story rigorous (Lyapunov, A-stability, BIBO) and adds the QR-based computational method for tracking eigenvalue trajectories during training. Chapter 3 introduces the structured-matrix vocabulary (Toeplitz, semiseparable, Vandermonde, Cauchy) needed to discuss the kernel constructions of S4 and friends. Chapters 4–6 then take the continuous state-space system here and discretize it — first crudely (ZOH, bilinear), then with full numerical-analysis machinery (Butcher tableau, A-stability), then via implicit and structure-preserving methods (Gauss–Legendre, symplectic integrators), which is where the active research pilot’s anchor (the C1 pilot) lives.

You can also profitably jump ahead: Chapter 8’s LTI SSM presentation reuses everything in this chapter and is self-contained for readers who want to see the SSM application before working through the full math foundation.

1.8 Exercises

Seven problems mixing computation and theory. Short/numerical exercises (1–4) have inline collapsible solutions; long/proof exercises (5–7) have full worked solutions in §1.9.

Exercise 1.1 (computation)

Compute the matrix exponential of A=(0110)\statemat = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} by hand, term-by-term, and verify it equals the rotation matrix R(t)=(costsintsintcost)R(t) = \begin{pmatrix} \cos t & \sin t \\ -\sin t & \cos t \end{pmatrix}.

Solution

Compute powers: A2=(1001)=I\statemat^2 = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} = -I, so A3=A\statemat^3 = -\statemat, A4=I\statemat^4 = I, and the pattern repeats with period 4. Splitting the series by parity:

eAt=k(At)kk!=Ik(1)kt2k(2k)!+Ak(1)kt2k+1(2k+1)!=Icost+Asint,e^{\statemat t} = \sum_{k} \frac{(\statemat t)^k}{k!} = I \sum_{k} \frac{(-1)^k t^{2k}}{(2k)!} + \statemat \sum_k \frac{(-1)^k t^{2k+1}}{(2k+1)!} = I \cos t + \statemat \sin t,

which expands to (costsintsintcost)=R(t)\begin{pmatrix} \cos t & \sin t \\ -\sin t & \cos t \end{pmatrix} = R(t). ∎

Exercise 1.2 (computation)

For the damped oscillator q¨+cq˙+kq=0\ddot q + c\dot q + k q = 0 with k=1k = 1 and c=0.5c = 0.5, classify the damping regime and find the eigenvalues of A\statemat.

Solution

Discriminant: c24k=0.254=3.75<0c^2 - 4k = 0.25 - 4 = -3.75 < 0, so the system is underdamped. Eigenvalues: λ±=0.25±i123.750.25±0.968i\lambda_\pm = -0.25 \pm i \tfrac{1}{2}\sqrt{3.75} \approx -0.25 \pm 0.968 i. The trajectory spirals toward the origin with envelope decay rate 0.250.25 and angular frequency 0.968\approx 0.968.

Exercise 1.3 (computation)

Lift the third-order ODE q+2q+q=u(t)q''' + 2 q'' + q = u(t) into a 3-dimensional state-space system ddth=Ah+Bu\ddt \statevec = \statemat \statevec + \inputmat u. Write A\statemat and B\inputmat explicitly.

Solution

Let h=(q,q,q)\statevec = (q, q', q'')^\top. Then h˙=(q,q,q)\dot \statevec = (q', q'', q''')^\top, and from the ODE q=2qq+uq''' = -2 q'' - q + u, so

A=(010001102),B=(001).\statemat = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & 0 & -2 \end{pmatrix}, \qquad \inputmat = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}.

The bottom row encodes the ODE’s coefficient signature; the upper-triangular ones implement the bookkeeping qq,qqq' \mapsto q', q'' \mapsto q''. This is the companion form of the ODE.

Exercise 1.4 (computation)

For the ring of n=4n = 4 identical damped oscillators with k=1k = 1, c=0c = 0, κ=0.5\kappa = 0.5 (no damping, symmetric coupling), use the formula in §1.4 to find the eigenvalues of A\statemat analytically. Confirm they lie on the imaginary axis (since c=0c = 0) and verify by running companions/ch01/jax/coupled_oscillators.py with n=4n = 4.

Solution

With c=0c = 0, λ±(m)=±ik+2κ(1cos(2πm/n))\lambda_\pm^{(m)} = \pm i\sqrt{k + 2\kappa(1 - \cos(2\pi m/n))} for m=0,1,2,3m = 0, 1, 2, 3. Numerically:

  • m=0m = 0: λ±=±i1=±i\lambda_\pm = \pm i\sqrt{1} = \pm i
  • m=1m = 1: λ±=±i1+20.51=±i2\lambda_\pm = \pm i\sqrt{1 + 2 \cdot 0.5 \cdot 1} = \pm i\sqrt{2}
  • m=2m = 2: λ±=±i1+20.52=±i3\lambda_\pm = \pm i\sqrt{1 + 2 \cdot 0.5 \cdot 2} = \pm i\sqrt{3}
  • m=3m = 3: same as m=1m = 1 by symmetry ⇒ ±i2\pm i\sqrt{2}

All 8 eigenvalues are pure imaginary (centers), as expected for an undamped conservative system. The numerical verification reproduces these (modulo double-precision noise).

Exercise 1.5 (theory) — solution in §1.9

Prove that the matrix-exponential series kMk/k!\sum_k M^k/k! converges absolutely (entrywise) for every square matrix MM, with bound eMFeMF\norm{e^M}_F \le e^{\norm{M}_F} where F\norm{\cdot}_F is the Frobenius norm.

Exercise 1.6 (theory) — solution in §1.9

Show that if A\statemat is skew-symmetric (A=A\statemat^\top = -\statemat), then eAte^{\statemat t} is orthogonal for every tRt \in \R (that is, (eAt)eAt=I(e^{\statemat t})^\top e^{\statemat t} = I). Use this to give a geometric reason why the rotation matrix of Exercise 1.1 is orthogonal.

Exercise 1.7 (theory) — solution in §1.9

Prove the variation of parameters formula: the unique solution of ddth=Ah+Bu(t)\ddt \statevec = \statemat \statevec + \inputmat u(t) with h(0)=h0\statevec(0) = \statevec_0 is

h(t)=eAth0+0teA(ts)Bu(s)ds.\statevec(t) = e^{\statemat t} \statevec_0 + \int_0^t e^{\statemat (t-s)} \inputmat u(s) \, ds.

1.9 Full solutions to theory exercises

Solution to Exercise 1.5

Let F\norm{\cdot}_F denote the Frobenius norm, which satisfies the sub-multiplicative property ABFAFBF\norm{AB}_F \le \norm{A}_F \norm{B}_F. Then MkFMFk\norm{M^k}_F \le \norm{M}_F^k by induction. Therefore

k=0KMkk!Fk=0KMkFk!k=0KMFkk!eMFas K.\left\| \sum_{k=0}^{K} \frac{M^k}{k!} \right\|_F \le \sum_{k=0}^K \frac{\norm{M^k}_F}{k!} \le \sum_{k=0}^K \frac{\norm{M}_F^k}{k!} \to e^{\norm{M}_F} \quad \text{as } K \to \infty.

The partial sums are entrywise bounded by terms of a convergent scalar series, so each matrix entry is Cauchy in KK and converges. The limit is what we call eMe^M. The bound eMFeMF\norm{e^M}_F \le e^{\norm{M}_F} follows by passing to the limit. ∎

Solution to Exercise 1.6

We have ddteAt=AeAt\ddt e^{\statemat t} = \statemat e^{\statemat t} and ddt(eAt)=(eAt)A\ddt (e^{\statemat t})^\top = (e^{\statemat t})^\top \statemat^\top. Let Q(t):=(eAt)eAtQ(t) := (e^{\statemat t})^\top e^{\statemat t}. Then

ddtQ=(eAt)AeAt+(eAt)AeAt=(eAt)(A+A)eAt=0,\ddt Q = (e^{\statemat t})^\top \statemat^\top e^{\statemat t} + (e^{\statemat t})^\top \statemat e^{\statemat t} = (e^{\statemat t})^\top (\statemat^\top + \statemat) e^{\statemat t} = 0,

since A+A=0\statemat^\top + \statemat = 0 by skew-symmetry. So Q(t)=Q(0)=IQ(t) = Q(0) = I for all tt, i.e. eAte^{\statemat t} is orthogonal.

Geometric interpretation: the rotation matrix R(t)R(t) of Exercise 1.1 was generated by A=(0110)\statemat = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}, which is skew-symmetric. So R(t)R(t) is orthogonal — which we already knew, but this derivation pins down why: orthogonality follows structurally from skew-symmetry of the generator, not from any property of the cos/sin functions. ∎

Solution to Exercise 1.7

Define h(t)\statevec(t) by the formula on the right-hand side. We verify (a) the initial condition and (b) the ODE.

Initial condition. At t=0t = 0 the integral vanishes (zero-length interval) and eA0=Ie^{\statemat \cdot 0} = I, so h(0)=h0\statevec(0) = \statevec_0. ✓

ODE. Differentiate using the product rule on the integral (Leibniz’s rule for differentiating under the integral sign):

ddth(t)=ddt[eAth0]+ddt0teA(ts)Bu(s)ds=AeAth0+[eA(tt)Bu(t)+0tteA(ts)Bu(s)ds]=AeAth0+Bu(t)+A0teA(ts)Bu(s)ds=A(eAth0+0teA(ts)Bu(s)ds)+Bu(t)=Ah(t)+Bu(t).\begin{aligned} \ddt \statevec(t) &= \ddt\left[ e^{\statemat t} \statevec_0 \right] + \ddt \int_0^t e^{\statemat (t-s)} \inputmat u(s) ds \\ &= \statemat e^{\statemat t} \statevec_0 + \left[ e^{\statemat(t-t)} \inputmat u(t) + \int_0^t \frac{\partial}{\partial t} e^{\statemat(t-s)} \inputmat u(s) ds \right] \\ &= \statemat e^{\statemat t} \statevec_0 + \inputmat u(t) + \statemat \int_0^t e^{\statemat(t-s)} \inputmat u(s) ds \\ &= \statemat \left( e^{\statemat t} \statevec_0 + \int_0^t e^{\statemat(t-s)} \inputmat u(s) ds \right) + \inputmat u(t) \\ &= \statemat \statevec(t) + \inputmat u(t). \quad \checkmark \end{aligned}

Uniqueness follows from the classical Picard–Lindelöf theorem applied to the right-hand side f(h,t)=Ah+Bu(t)f(\statevec, t) = \statemat \statevec + \inputmat u(t), which is Lipschitz in h\statevec uniformly in tt with constant A\norm{\statemat} (see Hairer–Nørsett–Wanner Hairer et al. (1993) for the standard proof). ∎

1.10 Companion code

Three runnable companions in companions/ch01/jax/:

  • damped_oscillator.py — simulates the underdamped oscillator, plots energy decay (Figure 1.1)
  • coupled_oscillators.py — constructs the ring-Laplacian state matrix, plots eigenvalue spectrum (Figure 1.2)
  • matrix_exponential.py — compares scipy.linalg.expm against truncated series sums; the truncated series catastrophically diverges for matrices with large spectral radius (auxiliary, used by Exercise 1.5)

All companions import their plotting style from companions/_shared/plot_utils.py. To run from the repo root:

PYTHONPATH=. python companions/ch01/jax/damped_oscillator.py
PYTHONPATH=. python companions/ch01/jax/coupled_oscillators.py
PYTHONPATH=. python companions/ch01/jax/matrix_exponential.py

Figures are written to public/figures/ch01/ and referenced from the <Figure> components above.