Short Definition
Event-time sampling selects and splits data based on when events actually occurred, not when they were processed or recorded.
Definition
Event-time sampling is a time-aware data sampling strategy that uses the true occurrence time of events (event time) as the basis for training and evaluation splits. This contrasts with processing-time or ingestion-time sampling, which relies on when data was logged, received, or stored.
Event-time sampling preserves causal ordering in systems with delays, retries, or asynchronous pipelines.
Why It Matters
In many real-world systems—such as streaming platforms, fraud detection, telemetry, and user interaction logs—events may be processed long after they occur. Sampling by processing time can introduce future information into the past, leading to subtle but severe data leakage.
Event-time sampling ensures models only learn from information that would have been available at prediction time.
Event Time vs Processing Time
- Event time: when the event actually happened
- Processing time: when the system observed or logged the event
These timestamps often differ due to latency, batching, or failures.
How Event-Time Sampling Works
A typical workflow:
- Identify the event-time timestamp for each observation
- Sort data strictly by event time
- Define training and evaluation windows using event time
- Exclude events whose labels would not yet be available
- Train and evaluate models accordingly
Label availability must be considered explicitly.
Minimal Conceptual Example
# conceptual event-time splittrain = data[data.event_time < T] test = data[(data.event_time >= T) & (data.event_time < T_next)]
Common Use Cases
Event-time sampling is essential in:
- streaming and real-time systems
- fraud and anomaly detection
- recommendation systems with delayed feedback
- IoT and sensor networks
- log-based modeling with ingestion delays
Any system with asynchronous data benefits from event-time reasoning.
Event-Time Sampling vs Time-Aware Sampling
- Time-aware sampling: respects temporal order in general
- Event-time sampling: specifically enforces true occurrence time
Event-time sampling is a stricter and more realistic temporal constraint.
Relationship to Label Latency
Event-time sampling must account for label latency—the delay between an event and the availability of its ground truth label. Ignoring label latency can reintroduce leakage even with correct event-time splits.
Common Pitfalls
- defaulting to processing-time timestamps
- mixing event-time and processing-time features
- ignoring late-arriving events
- training on labels that would not yet exist
- inconsistent handling of time zones or clocks
Temporal leakage is often invisible.
Relationship to Generalization
Event-time sampling produces generalization estimates that reflect real operational constraints. Models evaluated this way are more likely to behave reliably when deployed in time-dependent environments.
Relationship to Rolling Retraining
Rolling retraining pipelines should be driven by event-time windows rather than ingestion time to avoid training on information from the future relative to prediction time.
Related Concepts
- Data & Distribution
- Time-Aware Sampling
- Forward-Chaining Splits
- Rolling Window Sampling
- Label Latency
- Temporal Data Leakage
- Evaluation Protocols