Short Definition
Early exit networks are neural architectures that allow inputs to terminate computation at intermediate layers when sufficient confidence is reached.
Definition
Early exit networks augment a standard deep model with intermediate classifiers (“exits”) placed at selected depths. During inference, an input may exit early if an intermediate prediction meets a predefined criterion, reducing average computation while preserving accuracy on easy inputs.
Computation stops when it’s good enough.
Why It Matters
Deep models waste compute on easy cases. Early exit networks:
- reduce average inference latency
- lower energy and serving cost
- enable real-time constraints
- adapt computation to input difficulty
Not all inputs deserve full depth.
Core Mechanism
Early exit networks introduce:
- intermediate prediction heads
- a confidence or uncertainty criterion
- a policy for exiting vs continuing
Exit decisions control depth.
Minimal Conceptual Illustration
Input → Layer 1 → Exit?
↓ no
Layer 2 → Exit?
↓ no
Layer 3 → Final Output
Exit Criteria
Common exit conditions include:
- prediction confidence threshold
- entropy below a cutoff
- margin between top classes
- learned halting signal
Confidence governs stopping.
Training Strategies
Joint Training
All exits are trained simultaneously using shared representations.
Progressive Training
Exits are added or activated gradually to stabilize learning.
Loss Weighting
Intermediate exits receive weighted losses to balance shallow and deep learning.
Training shapes exit behavior.
Relationship to Adaptive Computation Depth
Early exit networks are a concrete implementation of adaptive computation depth, using discrete depth checkpoints rather than continuous halting.
Depth adapts in steps.
Optimization Dynamics
- shallow exits receive frequent supervision
- deeper layers focus on harder inputs
- gradients are depth-dependent
Learning exposure is stratified.
Inference Behavior
At inference time:
- easy inputs exit early
- difficult inputs traverse deeper layers
- average latency drops
- worst-case latency remains unchanged
Efficiency improves on average.
Generalization Considerations
Early exit networks may:
- generalize well on in-distribution easy cases
- misjudge difficulty under distribution shift
- exit too early on unfamiliar inputs
Difficulty is distribution-dependent.
Evaluation Metrics
Early exit networks require compute-aware evaluation:
- accuracy vs average compute
- exit depth distribution
- tail latency (p95 / p99)
- performance under OOD inputs
Accuracy alone is insufficient.
Failure Modes
Common failure modes include:
- premature exiting
- shallow overconfidence
- degraded deep-layer performance
- unstable exit thresholds
Early stopping can be wrong.
Practical Design Guidelines
- warm up without early exits
- calibrate exit confidence carefully
- monitor exit rates per depth
- evaluate under realistic traffic
- reassess thresholds after deployment
Exit policies need tuning.
Common Pitfalls
- assuming early exits always save compute
- ignoring tail latency
- freezing thresholds prematurely
- evaluating only average accuracy
- neglecting robustness under shift
Efficiency must be validated.
Summary Characteristics
| Aspect | Early Exit Networks |
|---|---|
| Conditional dimension | Depth |
| Compute savings | Average-case |
| Latency variability | Increased |
| Training complexity | Moderate–High |
| Deployment relevance | High |
Related Concepts
- Architecture & Representation
- Adaptive Computation Depth
- Conditional Computation
- Sparse Inference Optimization
- Compute-Aware Evaluation
- Routing Entropy