Short Definition
Learning rate scaling adjusts the learning rate based on batch size.
Definition
Learning rate scaling is the practice of modifying the learning rate when the batch size changes, in order to preserve similar training dynamics. As batch size increases and gradient noise decreases, the learning rate often needs to be increased to maintain effective learning.
Proper scaling is essential for stable and efficient large-batch training.
Why It Matters
Without learning rate scaling:
- training may become slow or stall
- convergence may degrade
- large-batch training can fail entirely
Learning rate scaling enables models to train efficiently across different batch sizes.
How It Works (Conceptually)
- Larger batches → lower gradient noise
- Learning rate increased to compensate
- Scaling rules (e.g. linear scaling) approximate equivalent updates
The goal is to maintain comparable update magnitudes.
Minimal Python Example
scaled_lr = base_lr * (batch_size / base_batch_size)
Common Pitfalls
- Blindly applying scaling rules
- Ignoring optimizer-specific behavior
- Failing to warm up learning rates
- Assuming scaling guarantees good generalization
Related Concepts
- Learning Rate
- Batch Size
- Large-Batch Training
- Optimization
- Training Dynamics