Training Instability

Short Definition

Training instability occurs when learning becomes erratic or diverges.

Definition

Training instability refers to situations where a neural network’s training behavior becomes unpredictable, oscillatory, or divergent. Symptoms include exploding loss, NaNs, wildly fluctuating metrics, or failure to converge.

Instability is often caused by poor hyperparameter choices or numerical issues.

Why It Matters

Unstable training:

  • wastes computational resources
  • prevents convergence
  • produces unreliable models
  • obscures underlying problems

Understanding instability helps diagnose optimization failures early.

How It Works (Conceptually)

  • Large or poorly scaled gradients cause overshooting
  • Learning rates are too high
  • Poor initialization amplifies updates
  • Numerical precision errors accumulate

Instability reflects a mismatch between update magnitude and loss landscape.

Minimal Python Example

Python
if loss != loss: # NaN check
stop_training()


Common Pitfalls

  • Increasing learning rate to “speed things up”
  • Ignoring gradient explosion
  • Failing to monitor loss values
  • Assuming instability will resolve itself

Related Concepts

  • Optimization
  • Learning Rate
  • Gradient Clipping
  • Training Dynamics
  • Convergence