Learning Rate Scaling

Short Definition

Learning rate scaling adjusts the learning rate based on batch size.

Definition

Learning rate scaling is the practice of modifying the learning rate when the batch size changes, in order to preserve similar training dynamics. As batch size increases and gradient noise decreases, the learning rate often needs to be increased to maintain effective learning.

Proper scaling is essential for stable and efficient large-batch training.

Why It Matters

Without learning rate scaling:

  • training may become slow or stall
  • convergence may degrade
  • large-batch training can fail entirely

Learning rate scaling enables models to train efficiently across different batch sizes.

How It Works (Conceptually)

  • Larger batches → lower gradient noise
  • Learning rate increased to compensate
  • Scaling rules (e.g. linear scaling) approximate equivalent updates

The goal is to maintain comparable update magnitudes.

Minimal Python Example

scaled_lr = base_lr * (batch_size / base_batch_size)

Common Pitfalls

  • Blindly applying scaling rules
  • Ignoring optimizer-specific behavior
  • Failing to warm up learning rates
  • Assuming scaling guarantees good generalization

Related Concepts

  • Learning Rate
  • Batch Size
  • Large-Batch Training
  • Optimization
  • Training Dynamics