Large-Batch Training

Short Definition

Large-batch training uses very large batches to compute gradient updates.

Definition

Large-batch training refers to training neural networks with batch sizes that are a significant fraction of the dataset or extremely large in absolute terms. This approach reduces gradient noise and enables high hardware utilization but introduces optimization and generalization challenges.

Large-batch training is common in distributed and industrial-scale systems.

Why It Matters

Large batches:

  • reduce gradient variance
  • improve hardware efficiency
  • allow faster wall-clock training

However, they may lead to:

  • sharp minima
  • poorer generalization
  • reduced robustness

Understanding these trade-offs is critical in large-scale training.

How It Works (Conceptually)

  • Gradients are computed over many samples
  • Updates are smoother and more deterministic
  • Fewer parameter updates per epoch
  • Learning rate must often be adjusted

Large-batch training changes training dynamics fundamentally.

Minimal Python Example

Python
batch_size = 8192 # example large batch

Common Pitfalls

  • Assuming faster training implies better models
  • Not adjusting learning rates
  • Running into memory constraints
  • Ignoring generalization degradation

Related Concepts

  • Batch Size
  • Gradient Noise
  • Learning Rate Scaling
  • Optimization
  • Generalization