Short Definition
Expanding window sampling trains models on an ever-growing set of historical data.
Definition
Expanding window sampling is a time-aware data selection strategy in which the training set begins with an initial historical window and progressively incorporates new data over time without discarding older observations. Evaluation is performed on future data following each training window expansion.
This approach emphasizes long-term accumulation of knowledge.
Why It Matters
When historical data remains relevant, discarding older observations can waste valuable information. Expanding window sampling preserves long-term patterns while still adapting to new data, making it suitable for relatively stable or slowly drifting environments.
It balances stability with gradual adaptation.
How Expanding Window Sampling Works
A typical workflow:
- Select an initial training period
- Train the model on all data up to time T
- Evaluate on the next future interval
- Extend the training window to include newly observed data
- Repeat across time
The training set only grows.
Minimal Conceptual Example
# conceptual expanding windowtrain = data[data.time <= T] test = data[(data.time > T) & (data.time <= T + horizon)]
Expanding Window vs Rolling Window
- Expanding window: accumulates all past data
- Rolling window: retains only the most recent data
Expanding windows favor stability; rolling windows favor recency.
Choosing Expanding Windows
Expanding window sampling is appropriate when:
- concept drift is weak or gradual
- long-term historical patterns remain predictive
- data volume is manageable
- interpretability and stability are priorities
It may underperform when drift is rapid.
Common Pitfalls
- allowing stale data to dominate learning
- ignoring changing feature semantics over time
- leaking future information via preprocessing
- failing to re-evaluate window strategy as conditions change
- assuming more data always improves performance
Accumulation is not adaptation.
Relationship to Time-Series Validation
Expanding window sampling defines how training data grows across time-series validation folds. It is commonly used in walk-forward validation setups.
Relationship to Rolling Retraining
Expanding window sampling corresponds to cumulative rolling retraining, where each retraining step incorporates all previously seen data.
Relationship to Generalization
Expanding window sampling estimates generalization under the assumption that future data resembles a mixture of past distributions. It may obscure sudden shifts or regime changes.