Short Definition
Batch size is the number of training samples processed before a model update.
Definition
Batch size defines how many training examples are used to compute a single gradient update during training. Instead of updating model parameters after every individual sample or after the entire dataset, training is typically performed on batches of samples.
Batch size directly affects gradient estimation, training stability, computational efficiency, and generalization behavior.
Why It Matters
Batch size influences:
- the noisiness of gradient updates
- convergence speed
- memory usage
- generalization performance
Choosing an inappropriate batch size can lead to unstable training, slow convergence, or poor generalization, even when the model and optimizer are otherwise correct.
How It Works (Conceptually)
- A batch of samples is selected from the training data
- The loss is computed over the batch
- Gradients are averaged across the batch
- Parameters are updated once per batch
Smaller batches produce noisier but more frequent updates, while larger batches produce smoother but less frequent updates.
Minimal Python Example
batch_size = 32for batch in get_batches(data, batch_size): loss = compute_loss(batch) update_parameters(loss)
Common Pitfalls
- Assuming larger batch sizes are always better
- Exceeding memory limits with large batches
- Ignoring interaction between batch size and learning rate
- Comparing results across different batch sizes without adjustment
- Treating batch size as purely a performance parameter
Related Concepts
- Optimization
- Optimizers
- Gradient Descent
- Learning Rate
- Training Dynamics
- Generalization
- Hyperparameters