Mini-Batch Gradient Descent

Short Definition

Mini-batch gradient descent updates model parameters using small batches of data.

Definition

Mini-batch gradient descent is an optimization strategy where gradients are computed using a subset (batch) of the training data rather than a single example or the entire dataset. It represents a practical compromise between stochastic gradient descent and full-batch gradient descent.

This approach is the standard method used in modern neural network training.

Why It Matters

Mini-batch gradient descent:

  • balances computational efficiency and gradient stability
  • enables parallel computation on GPUs
  • introduces beneficial gradient noise
  • scales well to large datasets

Without mini-batches, training deep neural networks efficiently would be impractical.

How It Works (Conceptually)

  • Training data is split into batches
  • Each batch produces one gradient estimate
  • Parameters are updated after each batch
  • Multiple batches form one training epoch

The batch size controls the trade-off between noise and stability.

Minimal Python Example

Python
for batch in batches(data, batch_size):
loss = compute_loss(batch)
update_parameters(loss)

Common Pitfalls

  • Confusing mini-batch with stochastic training
  • Using batch sizes that are too small or too large
  • Ignoring batch size–learning rate interaction
  • Comparing results across different batch sizes unfairly

Related Concepts

  • Batch Size
  • Gradient Descent
  • Optimization
  • Training Dynamics
  • Gradient Noise