Metric Drift

Short Definition

Metric drift occurs when the meaning, reliability, or relevance of an evaluation metric changes over time, causing the metric to no longer reflect true performance or outcomes.

Definition

Metric drift refers to the gradual or abrupt breakdown of the relationship between a metric and the underlying objective it was intended to measure. As data distributions, user behavior, system policies, or operational contexts evolve, metrics that were once valid indicators may become misleading.

Metrics age—even if models do not.

Why It Matters

Organizations often monitor model health using a fixed set of metrics. When metric drift goes unnoticed, systems may appear stable or improving while real-world performance silently degrades.

Stable metrics do not guarantee stable outcomes.

Common Causes of Metric Drift

Metric drift can be caused by:

distribution shift in input data
changes in user or environment behavior
feedback loops induced by the model
evolving business objectives
changes in labeling practices
policy or threshold adjustments

Metrics depend on context.

Metric Drift vs Data Drift

Data drift: changes in input data distributions
Metric drift: changes in what the metric signifies

Data drift can cause metric drift—but metric drift can occur without obvious data drift.

Metric Drift vs Model Drift

Model drift refers to degradation in model performance
Metric drift refers to degradation in metric validity

Metrics can drift even if model behavior is unchanged.

Minimal Conceptual Illustration

Same Metric Value → Different Real-World Meaning

Proxy Metrics and Drift

Proxy metrics are especially vulnerable to drift because they rely on assumed correlations with true outcomes. Over time, these correlations may weaken or reverse.

Proxy drift is often invisible until audited.

Detection Signals

Warning signs of metric drift include:

divergence between related metrics
improving offline metrics with stagnant outcomes
calibration deterioration without accuracy change
threshold instability
inconsistent performance across subgroups

When metrics disagree, investigate.

Relationship to Goodhart’s Law

Goodhart’s Law accelerates metric drift by incentivizing optimization against fixed metrics. The more aggressively a metric is optimized, the faster it tends to lose meaning.

Optimization speeds up decay.

Impact on Evaluation and Decision-Making

Metric drift can lead to:

incorrect model comparisons
premature or harmful deployments
delayed detection of failures
misaligned retraining decisions
erosion of trust in evaluation systems

Decisions inherit metric flaws.

Mitigation Strategies

Effective mitigation includes:

periodic metric review and retirement
validating metrics against long-term outcomes
using complementary metric sets
auditing proxy–outcome alignment
stress testing metrics under shift
maintaining evaluation governance

Metrics require lifecycle management.

Relationship to Evaluation Governance

Evaluation governance assigns ownership for:

monitoring metric validity
approving metric changes
documenting assumptions
responding to drift signals

Governance turns drift into a managed risk.

Common Pitfalls

assuming metric definitions are timeless
monitoring metrics without interpretation
ignoring subgroup-level metric drift
tying incentives to drifting metrics
reacting only after failures occur

Metrics should be questioned, not trusted.

Summary Characteristics

Aspect	Metric Drift
What changes	Metric meaning
Visibility	Often low
Speed	Gradual or sudden
Risk	Silent failure
Mitigation	Governance and auditing

Related Concepts

Generalization & Evaluation
Proxy Metrics
Goodhart’s Law (ML Context)
Metric Gaming
Outcome-Aware Evaluation
Evaluation Governance
Distribution Shift
Calibration Drift