Feature Versioning

Short Definition

Feature versioning tracks and manages changes to feature definitions over time.

Definition

Feature versioning is the practice of assigning explicit versions to machine learning features so that changes in definition, computation logic, data sources, or aggregation windows are recorded and reproducible. Each version represents a stable contract for how a feature is computed and served.

Feature versioning turns features into auditable, reproducible artifacts.

Why It Matters

Unversioned features change silently. When feature logic evolves without tracking, models become irreproducible, evaluations become incomparable, and production behavior becomes unpredictable.

Feature versioning prevents:

  • training–serving skew
  • irreproducible experiments
  • silent performance regressions
  • ambiguous model comparisons

It is essential for trustworthy ML systems.

What Triggers a New Feature Version

A new version is typically required when:

  • computation logic changes
  • aggregation windows or time cutoffs change
  • upstream data sources change
  • feature availability constraints change
  • bug fixes alter numerical values
  • schema or units change

Small changes can have large downstream effects.

Feature Versioning vs Model Versioning

  • Feature versioning: tracks inputs to models
  • Model versioning: tracks model parameters and architecture

Both are required for full reproducibility.

How Feature Versioning Is Used

Common workflows include:

  1. Define a feature with versioned metadata
  2. Train models against a fixed feature version
  3. Evaluate and deploy using the same version
  4. Introduce new versions for improvements
  5. Compare models across feature versions explicitly

Versions make comparisons meaningful.

Minimal Conceptual Example

# conceptual feature reference
feature = "user_avg_spend:v3"

Relationship to Feature Stores

Feature stores often implement feature versioning by:

  • maintaining immutable feature definitions
  • tracking lineage and dependencies
  • enforcing point-in-time correctness per version
  • enabling safe rollbacks

Feature stores operationalize versioning at scale.

Relationship to Feature Availability

Changing a feature’s availability (e.g., real-time vs batch) requires a new version. Versioning ensures models do not unknowingly depend on unavailable or future-derived features.

Common Pitfalls

  • overwriting feature logic without version increments
  • comparing models trained on different feature versions
  • failing to document semantic changes
  • mixing feature versions within a single model
  • assuming backward compatibility without validation

Version numbers are meaningless without discipline.

Relationship to Reproducibility

Feature versioning is a prerequisite for reproducibility. Without it, results cannot be reliably recreated—even with the same code and data snapshots.

Relationship to Generalization

Untracked feature changes can masquerade as generalization improvements or regressions. Versioning isolates the effect of feature evolution from true model learning.

Related Concepts

  • Data & Distribution
  • Feature Stores
  • Feature Availability
  • Causal Feature Engineering
  • Training–Serving Skew
  • Reproducibility in ML
  • Evaluation Protocols