Early Exit Networks

Short Definition

Early exit networks are neural architectures that allow inputs to terminate computation at intermediate layers when sufficient confidence is reached.

Definition

Early exit networks augment a standard deep model with intermediate classifiers (“exits”) placed at selected depths. During inference, an input may exit early if an intermediate prediction meets a predefined criterion, reducing average computation while preserving accuracy on easy inputs.

Computation stops when it’s good enough.

Why It Matters

Deep models waste compute on easy cases. Early exit networks:

  • reduce average inference latency
  • lower energy and serving cost
  • enable real-time constraints
  • adapt computation to input difficulty

Not all inputs deserve full depth.

Core Mechanism

Early exit networks introduce:

  • intermediate prediction heads
  • a confidence or uncertainty criterion
  • a policy for exiting vs continuing

Exit decisions control depth.

Minimal Conceptual Illustration


Input → Layer 1 → Exit?
↓ no
Layer 2 → Exit?
↓ no
Layer 3 → Final Output

Exit Criteria

Common exit conditions include:

  • prediction confidence threshold
  • entropy below a cutoff
  • margin between top classes
  • learned halting signal

Confidence governs stopping.

Training Strategies

Joint Training

All exits are trained simultaneously using shared representations.

Progressive Training

Exits are added or activated gradually to stabilize learning.

Loss Weighting

Intermediate exits receive weighted losses to balance shallow and deep learning.

Training shapes exit behavior.

Relationship to Adaptive Computation Depth

Early exit networks are a concrete implementation of adaptive computation depth, using discrete depth checkpoints rather than continuous halting.

Depth adapts in steps.

Optimization Dynamics

  • shallow exits receive frequent supervision
  • deeper layers focus on harder inputs
  • gradients are depth-dependent

Learning exposure is stratified.

Inference Behavior

At inference time:

  • easy inputs exit early
  • difficult inputs traverse deeper layers
  • average latency drops
  • worst-case latency remains unchanged

Efficiency improves on average.

Generalization Considerations

Early exit networks may:

  • generalize well on in-distribution easy cases
  • misjudge difficulty under distribution shift
  • exit too early on unfamiliar inputs

Difficulty is distribution-dependent.

Evaluation Metrics

Early exit networks require compute-aware evaluation:

  • accuracy vs average compute
  • exit depth distribution
  • tail latency (p95 / p99)
  • performance under OOD inputs

Accuracy alone is insufficient.

Failure Modes

Common failure modes include:

  • premature exiting
  • shallow overconfidence
  • degraded deep-layer performance
  • unstable exit thresholds

Early stopping can be wrong.

Practical Design Guidelines

  • warm up without early exits
  • calibrate exit confidence carefully
  • monitor exit rates per depth
  • evaluate under realistic traffic
  • reassess thresholds after deployment

Exit policies need tuning.

Common Pitfalls

  • assuming early exits always save compute
  • ignoring tail latency
  • freezing thresholds prematurely
  • evaluating only average accuracy
  • neglecting robustness under shift

Efficiency must be validated.

Summary Characteristics

AspectEarly Exit Networks
Conditional dimensionDepth
Compute savingsAverage-case
Latency variabilityIncreased
Training complexityModerate–High
Deployment relevanceHigh

Related Concepts

  • Architecture & Representation
  • Adaptive Computation Depth
  • Conditional Computation
  • Sparse Inference Optimization
  • Compute-Aware Evaluation
  • Routing Entropy