Batch Normalization

Short Definition

Batch normalization normalizes activations using batch statistics.

Definition

Batch normalization standardizes layer activations based on the mean and variance of each mini-batch. This reduces internal distribution shifts during training and stabilizes gradient flow.

In addition to normalization, batch normalization introduces learnable scale and shift parameters, allowing the network to retain expressive power.

Why It Matters

Batch normalization enables faster, more stable training and higher learning rates.

How It Works (Conceptually)

  • Normalize activations per batch
  • Apply learned scale and shift
  • Reduce sensitivity to initialization

Minimal Python Example

normalized = (x - batch_mean) / batch_std

Common Pitfalls

  • Using very small batch sizes
  • Applying incorrectly during inference
  • Confusing with input normalization

Related Concepts

  • Normalization
  • Training Stability
  • Deep Networks