Neural Network Lexicon

Gradient Noise

Short Definition

Gradient noise refers to randomness in gradient estimates during training.

Definition

Gradient noise arises when gradients are computed using a subset of data rather than the full dataset. This noise reflects variability in gradient estimates caused by sampling different mini-batches.

Gradient noise is not purely harmful; it often helps optimization escape poor solutions and improves generalization.

Why It Matters

Gradient noise:

affects convergence speed
influences training stability
helps escape saddle points
impacts generalization behavior

Understanding gradient noise explains why smaller batch sizes can sometimes outperform larger ones.

How It Works (Conceptually)

Each mini-batch provides an approximate gradient
Different batches yield different gradients
Updates fluctuate around the true gradient direction
Noise level decreases as batch size increases

The amount of gradient noise is inversely related to batch size.

Minimal Python Example

gradient = estimate_gradient(mini_batch)  # noisy estimate

Common Pitfalls

Assuming noise is always bad
Using batch sizes so small that training becomes unstable
Ignoring noise when tuning learning rates
Confusing gradient noise with data noise

Related Concepts

Mini-Batch Gradient Descent
Batch Size
Optimization
Training Dynamics
Generalization