Neural Network Lexicon

Batch Size

Short Definition

Batch size is the number of training samples processed before a model update.

Definition

Batch size defines how many training examples are used to compute a single gradient update during training. Instead of updating model parameters after every individual sample or after the entire dataset, training is typically performed on batches of samples.

Batch size directly affects gradient estimation, training stability, computational efficiency, and generalization behavior.

Why It Matters

Batch size influences:

the noisiness of gradient updates
convergence speed
memory usage
generalization performance

Choosing an inappropriate batch size can lead to unstable training, slow convergence, or poor generalization, even when the model and optimizer are otherwise correct.

How It Works (Conceptually)

A batch of samples is selected from the training data
The loss is computed over the batch
Gradients are averaged across the batch
Parameters are updated once per batch

Smaller batches produce noisier but more frequent updates, while larger batches produce smoother but less frequent updates.

Minimal Python Example

batch_size = 32
for batch in get_batches(data, batch_size):
    loss = compute_loss(batch)
    update_parameters(loss)

Common Pitfalls

Assuming larger batch sizes are always better
Exceeding memory limits with large batches
Ignoring interaction between batch size and learning rate
Comparing results across different batch sizes without adjustment
Treating batch size as purely a performance parameter

Related Concepts

Optimization
Optimizers
Gradient Descent
Learning Rate
Training Dynamics
Generalization
Hyperparameters