Early Exit Networks

Short Definition

Early exit networks are neural architectures that allow inputs to terminate computation at intermediate layers when sufficient confidence is reached.

Definition

Early exit networks augment a standard deep model with intermediate classifiers (“exits”) placed at selected depths. During inference, an input may exit early if an intermediate prediction meets a predefined criterion, reducing average computation while preserving accuracy on easy inputs.

Computation stops when it’s good enough.

Why It Matters

Deep models waste compute on easy cases. Early exit networks:

reduce average inference latency
lower energy and serving cost
enable real-time constraints
adapt computation to input difficulty

Not all inputs deserve full depth.

Core Mechanism

Early exit networks introduce:

intermediate prediction heads
a confidence or uncertainty criterion
a policy for exiting vs continuing

Exit decisions control depth.

Minimal Conceptual Illustration

Input → Layer 1 → Exit?
↓ no
Layer 2 → Exit?
↓ no
Layer 3 → Final Output

Exit Criteria

Common exit conditions include:

prediction confidence threshold
entropy below a cutoff
margin between top classes
learned halting signal

Confidence governs stopping.

Training Strategies

Joint Training

All exits are trained simultaneously using shared representations.

Progressive Training

Exits are added or activated gradually to stabilize learning.

Loss Weighting

Intermediate exits receive weighted losses to balance shallow and deep learning.

Training shapes exit behavior.

Relationship to Adaptive Computation Depth

Early exit networks are a concrete implementation of adaptive computation depth, using discrete depth checkpoints rather than continuous halting.

Depth adapts in steps.

Optimization Dynamics

shallow exits receive frequent supervision
deeper layers focus on harder inputs
gradients are depth-dependent

Learning exposure is stratified.

Inference Behavior

At inference time:

easy inputs exit early
difficult inputs traverse deeper layers
average latency drops
worst-case latency remains unchanged

Efficiency improves on average.

Generalization Considerations

Early exit networks may:

generalize well on in-distribution easy cases
misjudge difficulty under distribution shift
exit too early on unfamiliar inputs

Difficulty is distribution-dependent.

Evaluation Metrics

Early exit networks require compute-aware evaluation:

accuracy vs average compute
exit depth distribution
tail latency (p95 / p99)
performance under OOD inputs

Accuracy alone is insufficient.

Failure Modes

Common failure modes include:

premature exiting
shallow overconfidence
degraded deep-layer performance
unstable exit thresholds

Early stopping can be wrong.

Practical Design Guidelines

warm up without early exits
calibrate exit confidence carefully
monitor exit rates per depth
evaluate under realistic traffic
reassess thresholds after deployment

Exit policies need tuning.

Common Pitfalls

assuming early exits always save compute
ignoring tail latency
freezing thresholds prematurely
evaluating only average accuracy
neglecting robustness under shift

Efficiency must be validated.

Summary Characteristics

Aspect	Early Exit Networks
Conditional dimension	Depth
Compute savings	Average-case
Latency variability	Increased
Training complexity	Moderate–High
Deployment relevance	High

Related Concepts

Architecture & Representation
Adaptive Computation Depth
Conditional Computation
Sparse Inference Optimization
Compute-Aware Evaluation
Routing Entropy