Short Definition
Batch normalization normalizes activations using batch statistics.
Definition
Batch normalization standardizes layer activations based on the mean and variance of each mini-batch. This reduces internal distribution shifts during training and stabilizes gradient flow.
In addition to normalization, batch normalization introduces learnable scale and shift parameters, allowing the network to retain expressive power.
Why It Matters
Batch normalization enables faster, more stable training and higher learning rates.
How It Works (Conceptually)
- Normalize activations per batch
- Apply learned scale and shift
- Reduce sensitivity to initialization
Minimal Python Example
normalized = (x - batch_mean) / batch_std
Common Pitfalls
- Using very small batch sizes
- Applying incorrectly during inference
- Confusing with input normalization
Related Concepts
- Normalization
- Training Stability
- Deep Networks