Short Definition
Feature availability describes which input features are accessible at the moment a prediction is made.
Definition
Feature availability refers to the set of features that are legitimately known and computable at prediction time. It accounts for temporal constraints, system latency, data dependencies, and operational realities that determine whether a feature can be used without leaking future information.
A feature that exists in the dataset is not necessarily available at inference time.
Why It Matters
Using unavailable features leads to data leakage, inflated offline performance, and brittle models that fail in production. Many real-world failures occur because features were engineered or selected without respecting when and how information becomes available.
Feature availability enforces causal validity.
What Determines Feature Availability
Availability is governed by:
- event time vs processing time
- label latency and verification delays
- upstream system dependencies
- aggregation windows and update frequency
- real-time vs batch computation constraints
- access permissions and privacy rules
Availability is a systems property, not just a data property.
Feature Availability vs Feature Existence
- Feature existence: present in historical data
- Feature availability: accessible at prediction time
Conflating the two introduces leakage.
Minimal Conceptual Example
# invalid (unavailable at prediction time)feature = average(spend_over_next_30_days)# valid (available at prediction time)feature = average(spend_up_to_now)
Feature Availability in Training and Evaluation
Training and evaluation must:
- restrict features to those available at prediction time
- compute features using the same availability rules
- exclude backfilled or future-derived values
- align feature pipelines across offline and online settings
Mismatch here invalidates evaluation.
Relationship to Temporal Leakage
Violating feature availability is a primary cause of:
- temporal feature leakage
- processing-time leakage
- validation leakage in time-dependent tasks
Availability errors are often subtle and pervasive.
Feature Availability and Model Design
Feature availability can influence:
- model complexity and latency
- achievable performance ceilings
- retraining cadence
- monitoring and alerting design
Sometimes the “best” feature cannot be used safely.
Common Pitfalls
- selecting features based on offline correlation alone
- using aggregates that span beyond prediction cutoff
- assuming real-time access to batch-computed features
- training with backfilled data without adjusting timelines
- failing to document feature availability assumptions
Availability must be explicit and documented.
Relationship to Generalization
Models that rely on unavailable features appear to generalize well offline but collapse when deployed. Respecting feature availability yields more conservative but trustworthy generalization estimates.
Relationship to Evaluation Protocols
Evaluation protocols must enforce feature availability rules. Allowing unavailable features during validation or testing constitutes evaluation leakage.
Related Concepts
- Data & Distribution
- Temporal Feature Leakage
- Processing-Time Leakage
- Event-Time Sampling
- Label Latency
- Time-Aware Sampling
- Evaluation Protocols