Short Definition
Large-batch training uses very large batches to compute gradient updates.
Definition
Large-batch training refers to training neural networks with batch sizes that are a significant fraction of the dataset or extremely large in absolute terms. This approach reduces gradient noise and enables high hardware utilization but introduces optimization and generalization challenges.
Large-batch training is common in distributed and industrial-scale systems.
Why It Matters
Large batches:
- reduce gradient variance
- improve hardware efficiency
- allow faster wall-clock training
However, they may lead to:
- sharp minima
- poorer generalization
- reduced robustness
Understanding these trade-offs is critical in large-scale training.
How It Works (Conceptually)
- Gradients are computed over many samples
- Updates are smoother and more deterministic
- Fewer parameter updates per epoch
- Learning rate must often be adjusted
Large-batch training changes training dynamics fundamentally.
Minimal Python Example
Python
batch_size = 8192 # example large batch
Common Pitfalls
- Assuming faster training implies better models
- Not adjusting learning rates
- Running into memory constraints
- Ignoring generalization degradation
Related Concepts
- Batch Size
- Gradient Noise
- Learning Rate Scaling
- Optimization
- Generalization