Counterfactual Logging

Short Definition

Counterfactual logging is the practice of recording not only the action taken by a model, but also the alternative actions that could have been taken and their associated contexts.

Definition

Counterfactual logging captures information needed to reason about what would have happened if a different decision had been made. Instead of logging only the chosen prediction or action, systems log additional metadata—such as action probabilities, candidate rankings, or randomized alternatives—enabling causal and off-policy evaluation.

Without counterfactuals, causality cannot be reconstructed.

Why It Matters

Once a model is deployed, its decisions influence which outcomes are observed. Outcomes for actions not taken remain unobserved, creating selection bias. Counterfactual logging provides the data foundation needed to estimate causal effects, evaluate alternative policies, and mitigate feedback-loop bias.

You cannot evaluate what you never observe.

What Is Logged

Depending on the system, counterfactual logs may include:

  • the action taken
  • alternative actions considered
  • action probabilities or propensities
  • model scores or rankings
  • context features at decision time
  • timestamps and policy version

Logs encode decision uncertainty.

Minimal Conceptual Illustration


Context → {Action A (taken), Action B (not taken), Action C (not taken)}
↑ probabilities logged

Relationship to Causal Evaluation

Counterfactual logging enables causal evaluation by supporting:

  • inverse propensity scoring
  • off-policy evaluation
  • counterfactual risk estimation
  • unbiased comparison of policies

Causal claims require counterfactual data.

Relationship to Feedback Loops

Feedback loops censor outcomes for actions not taken. Counterfactual logging helps expose this censoring by preserving information about suppressed alternatives.

Logs break feedback opacity.

Role in Online vs Offline Evaluation

Offline evaluation typically lacks counterfactuals. Online systems that log propensities or randomized actions can later perform offline causal analysis without rerunning experiments.

Logging turns online actions into offline evidence.

Use in Policy Evaluation

Counterfactual logs allow teams to:

  • evaluate new models without deployment
  • simulate alternative thresholds or policies
  • compare ranking strategies
  • audit past decisions retrospectively

Decisions can be re-evaluated safely.

Requirements and Constraints

Effective counterfactual logging requires:

  • some degree of randomness or exploration
  • stable policy identifiers
  • careful data storage and privacy handling
  • sufficient coverage of alternative actions

Purely deterministic systems cannot log counterfactuals.

Risks and Limitations

  • increased system complexity
  • higher logging and storage costs
  • incomplete coverage of action space
  • sensitivity to logging errors
  • misuse without causal expertise

Bad logs create false confidence.

Common Pitfalls

  • logging scores without action probabilities
  • changing policies without version tracking
  • assuming counterfactual validity without exploration
  • ignoring bias introduced by partial logging
  • treating logged alternatives as true outcomes

Counterfactuals are estimates, not facts.

Relationship to Outcome-Aware Evaluation

Outcome-aware evaluation asks whether outcomes improved. Counterfactual logging enables attribution—determining whether improvements were caused by the model or by external factors.

Outcomes need explanations.

Role in Evaluation Governance

Governance should define:

  • when counterfactual logging is required
  • minimum logging standards
  • audit procedures for logged data
  • acceptable use cases and limitations

Causal evidence requires disciplined logging.

Summary Characteristics

AspectCounterfactual Logging
PurposeEnable causal inference
Data capturedTaken + untaken actions
DependencyExploration or randomness
Evaluation roleFoundational
ComplexityHigh

Related Concepts

  • Generalization & Evaluation
  • Causal Evaluation
  • Feedback Loops
  • Outcome-Aware Evaluation
  • Online vs Offline Evaluation
  • Off-Policy Evaluation
  • Exploration vs Exploitation
  • Model Update Policies