Gradient Noise

Short Definition

Gradient noise refers to randomness in gradient estimates during training.

Definition

Gradient noise arises when gradients are computed using a subset of data rather than the full dataset. This noise reflects variability in gradient estimates caused by sampling different mini-batches.

Gradient noise is not purely harmful; it often helps optimization escape poor solutions and improves generalization.

Why It Matters

Gradient noise:

  • affects convergence speed
  • influences training stability
  • helps escape saddle points
  • impacts generalization behavior

Understanding gradient noise explains why smaller batch sizes can sometimes outperform larger ones.

How It Works (Conceptually)

  • Each mini-batch provides an approximate gradient
  • Different batches yield different gradients
  • Updates fluctuate around the true gradient direction
  • Noise level decreases as batch size increases

The amount of gradient noise is inversely related to batch size.

Minimal Python Example

Python
gradient = estimate_gradient(mini_batch) # noisy estimate

Common Pitfalls

  • Assuming noise is always bad
  • Using batch sizes so small that training becomes unstable
  • Ignoring noise when tuning learning rates
  • Confusing gradient noise with data noise

Related Concepts

  • Mini-Batch Gradient Descent
  • Batch Size
  • Optimization
  • Training Dynamics
  • Generalization