Batch Size

Short Definition

Batch size is the number of training samples processed before a model update.

Definition

Batch size defines how many training examples are used to compute a single gradient update during training. Instead of updating model parameters after every individual sample or after the entire dataset, training is typically performed on batches of samples.

Batch size directly affects gradient estimation, training stability, computational efficiency, and generalization behavior.

Why It Matters

Batch size influences:

  • the noisiness of gradient updates
  • convergence speed
  • memory usage
  • generalization performance

Choosing an inappropriate batch size can lead to unstable training, slow convergence, or poor generalization, even when the model and optimizer are otherwise correct.

How It Works (Conceptually)

  • A batch of samples is selected from the training data
  • The loss is computed over the batch
  • Gradients are averaged across the batch
  • Parameters are updated once per batch

Smaller batches produce noisier but more frequent updates, while larger batches produce smoother but less frequent updates.

Minimal Python Example

Python
batch_size = 32
for batch in get_batches(data, batch_size):
loss = compute_loss(batch)
update_parameters(loss)

Common Pitfalls

  • Assuming larger batch sizes are always better
  • Exceeding memory limits with large batches
  • Ignoring interaction between batch size and learning rate
  • Comparing results across different batch sizes without adjustment
  • Treating batch size as purely a performance parameter

Related Concepts

  • Optimization
  • Optimizers
  • Gradient Descent
  • Learning Rate
  • Training Dynamics
  • Generalization
  • Hyperparameters