Neural Network Lexicon

Learning Rate Scaling

Short Definition

Learning rate scaling adjusts the learning rate based on batch size.

Definition

Learning rate scaling is the practice of modifying the learning rate when the batch size changes, in order to preserve similar training dynamics. As batch size increases and gradient noise decreases, the learning rate often needs to be increased to maintain effective learning.

Proper scaling is essential for stable and efficient large-batch training.

Why It Matters

Without learning rate scaling:

training may become slow or stall
convergence may degrade
large-batch training can fail entirely

Learning rate scaling enables models to train efficiently across different batch sizes.

How It Works (Conceptually)

Larger batches → lower gradient noise
Learning rate increased to compensate
Scaling rules (e.g. linear scaling) approximate equivalent updates

The goal is to maintain comparable update magnitudes.

Minimal Python Example

scaled_lr = base_lr * (batch_size / base_batch_size)

Common Pitfalls

Blindly applying scaling rules
Ignoring optimizer-specific behavior
Failing to warm up learning rates
Assuming scaling guarantees good generalization