Event-Time Sampling

Short Definition

Event-time sampling selects and splits data based on when events actually occurred, not when they were processed or recorded.

Definition

Event-time sampling is a time-aware data sampling strategy that uses the true occurrence time of events (event time) as the basis for training and evaluation splits. This contrasts with processing-time or ingestion-time sampling, which relies on when data was logged, received, or stored.

Event-time sampling preserves causal ordering in systems with delays, retries, or asynchronous pipelines.

Why It Matters

In many real-world systems—such as streaming platforms, fraud detection, telemetry, and user interaction logs—events may be processed long after they occur. Sampling by processing time can introduce future information into the past, leading to subtle but severe data leakage.

Event-time sampling ensures models only learn from information that would have been available at prediction time.

Event Time vs Processing Time

Event time: when the event actually happened
Processing time: when the system observed or logged the event

These timestamps often differ due to latency, batching, or failures.

How Event-Time Sampling Works

A typical workflow:

Identify the event-time timestamp for each observation
Sort data strictly by event time
Define training and evaluation windows using event time
Exclude events whose labels would not yet be available
Train and evaluate models accordingly

Label availability must be considered explicitly.

Minimal Conceptual Example

			
# conceptual event-time split
train = data[data.event_time < T] 
test = data[(data.event_time >= T) & (data.event_time < T_next)]

Common Use Cases

Event-time sampling is essential in:

streaming and real-time systems
fraud and anomaly detection
recommendation systems with delayed feedback
IoT and sensor networks
log-based modeling with ingestion delays

Any system with asynchronous data benefits from event-time reasoning.