State-Space Models vs RNNs

Short Definition

State-Space Models (SSMs) and Recurrent Neural Networks (RNNs) are both architectures designed to process sequential data, but they differ in how they represent and propagate temporal information.

RNNs rely on nonlinear recurrent hidden states updated step-by-step, while modern neural state-space models use structured linear dynamical systems combined with neural parameterization to capture long-range dependencies more efficiently.

Definition

Sequential models aim to learn patterns over ordered data such as:

  • language
  • time series
  • audio
  • control signals

Two major families are:

Recurrent Neural Networks

[
h_t = f(W_h h_{t-1} + W_x x_t)
]

where:

  • (h_t) = hidden state
  • (x_t) = input
  • (f) = nonlinear activation

RNNs propagate information through a nonlinear recurrence.

State-Space Models

State-space models originate from control theory and dynamical systems:

[
h_t = A h_{t-1} + B x_t
]

[
y_t = C h_t
]

Where:

  • (h_t) = latent state
  • (A) = transition matrix
  • (B) = input projection
  • (C) = output projection

Modern neural SSMs extend this with learned parameters and structured kernels.

Core Conceptual Difference

RNNs:

  • Nonlinear recurrence
  • Sequential computation
  • Hidden state learned implicitly

SSMs:

  • Linear dynamical system core
  • Structured temporal kernels
  • Parallelizable computation

The key difference is how sequence memory is represented and computed.

Minimal Conceptual Illustration

RNN

x1 → h1 → h2 → h3 → h4

recurrence

Information flows step-by-step.

State-Space Model

Information flows step-by-step.

x → linear dynamical system → outputs

Temporal influence encoded in system dynamics rather than nonlinear recurrence.

Computational Differences

PropertyRNNsState-Space Models
Sequential dependencyHighLower
ParallelizationLimitedOften possible
Long-range memoryDifficultDesigned for it
Stability analysisHardWell studied
Theoretical groundingDeep learningControl theory

Long-Range Dependency Handling

RNNs struggle with long contexts due to:

  • vanishing gradients
  • exploding gradients

SSMs instead model temporal propagation using structured dynamics:

[
h_t = A^t h_0
]

This enables long-range signal propagation without repeated nonlinear transformations.

Recent models such as S4 exploit this property.

Modern Neural State-Space Models

Recent architectures include:

  • S4 (Structured State Space Sequence Models)
  • Mamba
  • S5

These models combine:

  • linear dynamical systems
  • efficient convolution kernels
  • neural parameterization

They aim to compete with Transformers for sequence modeling.

Comparison to Transformers

ArchitectureKey mechanism
RNNrecurrent hidden state
Transformerattention
SSMlinear dynamical system

SSMs provide:

  • linear-time sequence processing
  • long-context capability
  • memory-efficient inference

Strengths of RNNs

RNNs are:

  • simple
  • expressive nonlinear models
  • well-understood in many domains

Variants such as:

  • LSTM
  • GRU

improve stability.

Strengths of State-Space Models

SSMs provide:

  • strong theoretical foundation
  • efficient long-range modeling
  • structured temporal memory
  • stable gradient propagation

They also enable efficient sequence processing for extremely long contexts.

Limitations

RNN limitations

  • gradient instability
  • limited parallelism
  • weak long-range memory

SSM limitations

  • harder architecture design
  • complex parameterization
  • fewer mature training recipes

However, modern research is rapidly improving them.

Alignment Perspective

Sequence architectures influence:

  • interpretability
  • training stability
  • long-context reasoning

Architectures with stable dynamics may reduce training instability and improve interpretability of temporal behavior

Summary

State-Space Models and RNNs are two approaches to sequence modeling.

RNNs rely on nonlinear recurrent hidden states updated sequentially.

State-space models use structured dynamical systems to propagate information through time.

Modern neural SSMs combine control theory with deep learning to achieve scalable long-context modeling.

Related Concepts