Short Definition
Counterfactual logging is the practice of recording not only the action taken by a model, but also the alternative actions that could have been taken and their associated contexts.
Definition
Counterfactual logging captures information needed to reason about what would have happened if a different decision had been made. Instead of logging only the chosen prediction or action, systems log additional metadata—such as action probabilities, candidate rankings, or randomized alternatives—enabling causal and off-policy evaluation.
Without counterfactuals, causality cannot be reconstructed.
Why It Matters
Once a model is deployed, its decisions influence which outcomes are observed. Outcomes for actions not taken remain unobserved, creating selection bias. Counterfactual logging provides the data foundation needed to estimate causal effects, evaluate alternative policies, and mitigate feedback-loop bias.
You cannot evaluate what you never observe.
What Is Logged
Depending on the system, counterfactual logs may include:
- the action taken
- alternative actions considered
- action probabilities or propensities
- model scores or rankings
- context features at decision time
- timestamps and policy version
Logs encode decision uncertainty.
Minimal Conceptual Illustration
Context → {Action A (taken), Action B (not taken), Action C (not taken)}
↑ probabilities logged
Relationship to Causal Evaluation
Counterfactual logging enables causal evaluation by supporting:
- inverse propensity scoring
- off-policy evaluation
- counterfactual risk estimation
- unbiased comparison of policies
Causal claims require counterfactual data.
Relationship to Feedback Loops
Feedback loops censor outcomes for actions not taken. Counterfactual logging helps expose this censoring by preserving information about suppressed alternatives.
Logs break feedback opacity.
Role in Online vs Offline Evaluation
Offline evaluation typically lacks counterfactuals. Online systems that log propensities or randomized actions can later perform offline causal analysis without rerunning experiments.
Logging turns online actions into offline evidence.
Use in Policy Evaluation
Counterfactual logs allow teams to:
- evaluate new models without deployment
- simulate alternative thresholds or policies
- compare ranking strategies
- audit past decisions retrospectively
Decisions can be re-evaluated safely.
Requirements and Constraints
Effective counterfactual logging requires:
- some degree of randomness or exploration
- stable policy identifiers
- careful data storage and privacy handling
- sufficient coverage of alternative actions
Purely deterministic systems cannot log counterfactuals.
Risks and Limitations
- increased system complexity
- higher logging and storage costs
- incomplete coverage of action space
- sensitivity to logging errors
- misuse without causal expertise
Bad logs create false confidence.
Common Pitfalls
- logging scores without action probabilities
- changing policies without version tracking
- assuming counterfactual validity without exploration
- ignoring bias introduced by partial logging
- treating logged alternatives as true outcomes
Counterfactuals are estimates, not facts.
Relationship to Outcome-Aware Evaluation
Outcome-aware evaluation asks whether outcomes improved. Counterfactual logging enables attribution—determining whether improvements were caused by the model or by external factors.
Outcomes need explanations.
Role in Evaluation Governance
Governance should define:
- when counterfactual logging is required
- minimum logging standards
- audit procedures for logged data
- acceptable use cases and limitations
Causal evidence requires disciplined logging.
Summary Characteristics
| Aspect | Counterfactual Logging |
|---|---|
| Purpose | Enable causal inference |
| Data captured | Taken + untaken actions |
| Dependency | Exploration or randomness |
| Evaluation role | Foundational |
| Complexity | High |
Related Concepts
- Generalization & Evaluation
- Causal Evaluation
- Feedback Loops
- Outcome-Aware Evaluation
- Online vs Offline Evaluation
- Off-Policy Evaluation
- Exploration vs Exploitation
- Model Update Policies