Short Definition
Mini-batch gradient descent updates model parameters using small batches of data.
Definition
Mini-batch gradient descent is an optimization strategy where gradients are computed using a subset (batch) of the training data rather than a single example or the entire dataset. It represents a practical compromise between stochastic gradient descent and full-batch gradient descent.
This approach is the standard method used in modern neural network training.
Why It Matters
Mini-batch gradient descent:
- balances computational efficiency and gradient stability
- enables parallel computation on GPUs
- introduces beneficial gradient noise
- scales well to large datasets
Without mini-batches, training deep neural networks efficiently would be impractical.
How It Works (Conceptually)
- Training data is split into batches
- Each batch produces one gradient estimate
- Parameters are updated after each batch
- Multiple batches form one training epoch
The batch size controls the trade-off between noise and stability.
Minimal Python Example
Python
for batch in batches(data, batch_size): loss = compute_loss(batch) update_parameters(loss)
Common Pitfalls
- Confusing mini-batch with stochastic training
- Using batch sizes that are too small or too large
- Ignoring batch size–learning rate interaction
- Comparing results across different batch sizes unfairly
Related Concepts
- Batch Size
- Gradient Descent
- Optimization
- Training Dynamics
- Gradient Noise