Neural Network Lexicon

Large-Batch Training

Short Definition

Large-batch training uses very large batches to compute gradient updates.

Definition

Large-batch training refers to training neural networks with batch sizes that are a significant fraction of the dataset or extremely large in absolute terms. This approach reduces gradient noise and enables high hardware utilization but introduces optimization and generalization challenges.

Large-batch training is common in distributed and industrial-scale systems.

Why It Matters

Large batches:

reduce gradient variance
improve hardware efficiency
allow faster wall-clock training

However, they may lead to:

sharp minima
poorer generalization
reduced robustness

Understanding these trade-offs is critical in large-scale training.

How It Works (Conceptually)

Gradients are computed over many samples
Updates are smoother and more deterministic
Fewer parameter updates per epoch
Learning rate must often be adjusted

Large-batch training changes training dynamics fundamentally.

Minimal Python Example

batch_size = 8192 # example large batch

Common Pitfalls

Assuming faster training implies better models
Not adjusting learning rates
Running into memory constraints
Ignoring generalization degradation