Feature Availability

Short Definition

Feature availability describes which input features are accessible at the moment a prediction is made.

Definition

Feature availability refers to the set of features that are legitimately known and computable at prediction time. It accounts for temporal constraints, system latency, data dependencies, and operational realities that determine whether a feature can be used without leaking future information.

A feature that exists in the dataset is not necessarily available at inference time.

Why It Matters

Using unavailable features leads to data leakage, inflated offline performance, and brittle models that fail in production. Many real-world failures occur because features were engineered or selected without respecting when and how information becomes available.

Feature availability enforces causal validity.

What Determines Feature Availability

Availability is governed by:

  • event time vs processing time
  • label latency and verification delays
  • upstream system dependencies
  • aggregation windows and update frequency
  • real-time vs batch computation constraints
  • access permissions and privacy rules

Availability is a systems property, not just a data property.

Feature Availability vs Feature Existence

  • Feature existence: present in historical data
  • Feature availability: accessible at prediction time

Conflating the two introduces leakage.

Minimal Conceptual Example

# invalid (unavailable at prediction time)
feature = average(spend_over_next_30_days)
# valid (available at prediction time)
feature = average(spend_up_to_now)

Feature Availability in Training and Evaluation

Training and evaluation must:

  • restrict features to those available at prediction time
  • compute features using the same availability rules
  • exclude backfilled or future-derived values
  • align feature pipelines across offline and online settings

Mismatch here invalidates evaluation.

Relationship to Temporal Leakage

Violating feature availability is a primary cause of:

  • temporal feature leakage
  • processing-time leakage
  • validation leakage in time-dependent tasks

Availability errors are often subtle and pervasive.

Feature Availability and Model Design

Feature availability can influence:

  • model complexity and latency
  • achievable performance ceilings
  • retraining cadence
  • monitoring and alerting design

Sometimes the “best” feature cannot be used safely.

Common Pitfalls

  • selecting features based on offline correlation alone
  • using aggregates that span beyond prediction cutoff
  • assuming real-time access to batch-computed features
  • training with backfilled data without adjusting timelines
  • failing to document feature availability assumptions

Availability must be explicit and documented.

Relationship to Generalization

Models that rely on unavailable features appear to generalize well offline but collapse when deployed. Respecting feature availability yields more conservative but trustworthy generalization estimates.

Relationship to Evaluation Protocols

Evaluation protocols must enforce feature availability rules. Allowing unavailable features during validation or testing constitutes evaluation leakage.

Related Concepts