Short Definition
Learning rate schedules change the learning rate during training.
Definition
A learning rate schedule adjusts the step size used by gradient descent as training progresses. Early in training, larger learning rates help the model make rapid progress, while later smaller learning rates allow fine-grained adjustments.
This dynamic approach often leads to faster convergence and better final performance than using a fixed learning rate.
Why It Matters
A single learning rate is rarely optimal across all training phases.
How It Works (Conceptually)
- Start with a relatively large learning rate
- Gradually reduce it over time
- Stabilize convergence near minima
Minimal Python Example
learning_rate *= decay
Common Pitfalls
- Decaying too aggressively
- Never decaying at all
- Blindly copying schedules
Related Concepts
- Learning Rate
- Gradient Descent
- Optimization