Learning Rate Schedules

Short Definition

Learning rate schedules change the learning rate during training.

Definition

A learning rate schedule adjusts the step size used by gradient descent as training progresses. Early in training, larger learning rates help the model make rapid progress, while later smaller learning rates allow fine-grained adjustments.

This dynamic approach often leads to faster convergence and better final performance than using a fixed learning rate.

Why It Matters

A single learning rate is rarely optimal across all training phases.

How It Works (Conceptually)

  • Start with a relatively large learning rate
  • Gradually reduce it over time
  • Stabilize convergence near minima

Minimal Python Example

learning_rate *= decay

Common Pitfalls

  • Decaying too aggressively
  • Never decaying at all
  • Blindly copying schedules

Related Concepts

  • Learning Rate
  • Gradient Descent
  • Optimization