Short Definition
Training instability occurs when learning becomes erratic or diverges.
Definition
Training instability refers to situations where a neural network’s training behavior becomes unpredictable, oscillatory, or divergent. Symptoms include exploding loss, NaNs, wildly fluctuating metrics, or failure to converge.
Instability is often caused by poor hyperparameter choices or numerical issues.
Why It Matters
Unstable training:
- wastes computational resources
- prevents convergence
- produces unreliable models
- obscures underlying problems
Understanding instability helps diagnose optimization failures early.
How It Works (Conceptually)
- Large or poorly scaled gradients cause overshooting
- Learning rates are too high
- Poor initialization amplifies updates
- Numerical precision errors accumulate
Instability reflects a mismatch between update magnitude and loss landscape.
Minimal Python Example
Python
if loss != loss: # NaN check stop_training()
Common Pitfalls
- Increasing learning rate to “speed things up”
- Ignoring gradient explosion
- Failing to monitor loss values
- Assuming instability will resolve itself
Related Concepts
- Optimization
- Learning Rate
- Gradient Clipping
- Training Dynamics
- Convergence