Expanding Window Sampling

Short Definition

Expanding window sampling trains models on an ever-growing set of historical data.

Definition

Expanding window sampling is a time-aware data selection strategy in which the training set begins with an initial historical window and progressively incorporates new data over time without discarding older observations. Evaluation is performed on future data following each training window expansion.

This approach emphasizes long-term accumulation of knowledge.

Why It Matters

When historical data remains relevant, discarding older observations can waste valuable information. Expanding window sampling preserves long-term patterns while still adapting to new data, making it suitable for relatively stable or slowly drifting environments.

It balances stability with gradual adaptation.

How Expanding Window Sampling Works

A typical workflow:

  1. Select an initial training period
  2. Train the model on all data up to time T
  3. Evaluate on the next future interval
  4. Extend the training window to include newly observed data
  5. Repeat across time

The training set only grows.

Minimal Conceptual Example

# conceptual expanding window
train = data[data.time <= T]
test = data[(data.time > T) & (data.time <= T + horizon)]

Expanding Window vs Rolling Window

  • Expanding window: accumulates all past data
  • Rolling window: retains only the most recent data

Expanding windows favor stability; rolling windows favor recency.

Choosing Expanding Windows

Expanding window sampling is appropriate when:

  • concept drift is weak or gradual
  • long-term historical patterns remain predictive
  • data volume is manageable
  • interpretability and stability are priorities

It may underperform when drift is rapid.

Common Pitfalls

  • allowing stale data to dominate learning
  • ignoring changing feature semantics over time
  • leaking future information via preprocessing
  • failing to re-evaluate window strategy as conditions change
  • assuming more data always improves performance

Accumulation is not adaptation.

Relationship to Time-Series Validation

Expanding window sampling defines how training data grows across time-series validation folds. It is commonly used in walk-forward validation setups.

Relationship to Rolling Retraining

Expanding window sampling corresponds to cumulative rolling retraining, where each retraining step incorporates all previously seen data.

Relationship to Generalization

Expanding window sampling estimates generalization under the assumption that future data resembles a mixture of past distributions. It may obscure sudden shifts or regime changes.

Related Concepts