Neural Network Lexicon

Mini-Batch Gradient Descent

Short Definition

Mini-batch gradient descent updates model parameters using small batches of data.

Definition

Mini-batch gradient descent is an optimization strategy where gradients are computed using a subset (batch) of the training data rather than a single example or the entire dataset. It represents a practical compromise between stochastic gradient descent and full-batch gradient descent.

This approach is the standard method used in modern neural network training.

Why It Matters

Mini-batch gradient descent:

balances computational efficiency and gradient stability
enables parallel computation on GPUs
introduces beneficial gradient noise
scales well to large datasets

Without mini-batches, training deep neural networks efficiently would be impractical.

How It Works (Conceptually)

Training data is split into batches
Each batch produces one gradient estimate
Parameters are updated after each batch
Multiple batches form one training epoch

The batch size controls the trade-off between noise and stability.

Minimal Python Example

for batch in batches(data, batch_size):
  loss = compute_loss(batch)
  update_parameters(loss)

Common Pitfalls

Confusing mini-batch with stochastic training
Using batch sizes that are too small or too large
Ignoring batch size–learning rate interaction
Comparing results across different batch sizes unfairly

Related Concepts

Batch Size
Gradient Descent
Optimization
Training Dynamics
Gradient Noise