Short Definition
Gradient noise refers to randomness in gradient estimates during training.
Definition
Gradient noise arises when gradients are computed using a subset of data rather than the full dataset. This noise reflects variability in gradient estimates caused by sampling different mini-batches.
Gradient noise is not purely harmful; it often helps optimization escape poor solutions and improves generalization.
Why It Matters
Gradient noise:
- affects convergence speed
- influences training stability
- helps escape saddle points
- impacts generalization behavior
Understanding gradient noise explains why smaller batch sizes can sometimes outperform larger ones.
How It Works (Conceptually)
- Each mini-batch provides an approximate gradient
- Different batches yield different gradients
- Updates fluctuate around the true gradient direction
- Noise level decreases as batch size increases
The amount of gradient noise is inversely related to batch size.
Minimal Python Example
Python
gradient = estimate_gradient(mini_batch) # noisy estimate
Common Pitfalls
- Assuming noise is always bad
- Using batch sizes so small that training becomes unstable
- Ignoring noise when tuning learning rates
- Confusing gradient noise with data noise
Related Concepts
- Mini-Batch Gradient Descent
- Batch Size
- Optimization
- Training Dynamics
- Generalization