Hyperparameters

Short Definition

Hyperparameters are configuration values that control how a neural network is trained.

Definition

Hyperparameters are settings chosen before or outside the training process that influence model architecture, optimization behavior, and training dynamics. Unlike model parameters (weights and biases), hyperparameters are not learned from data but are set by the practitioner.

Examples include learning rate, batch size, number of layers, and regularization strength.

Why It Matters

Hyperparameters strongly affect:

  • convergence speed
  • training stability
  • generalization performance
  • resource usage

Poor hyperparameter choices can prevent learning entirely or lead to misleading results, even when the model and data are otherwise sound.

How It Works (Conceptually)

  • Hyperparameters define the learning environment
  • Training algorithms operate within these constraints
  • Gradients update parameters, but hyperparameters shape how those updates behave
  • Changing hyperparameters changes the trajectory of training

Hyperparameters indirectly control how a model explores the loss landscape.

Minimal Python Example

Python
learning_rate = 0.01
batch_size = 32
epochs = 50


Common Pitfalls

  • Treating hyperparameters as fixed defaults
  • Tuning hyperparameters on the test set
  • Over-optimizing hyperparameters instead of fixing data issues
  • Assuming more tuning always leads to better generalization
  • Ignoring interactions between hyperparameters

Related Concepts

  • Optimization
  • Optimizers
  • Learning Rate
  • Batch Size
  • Training Dynamics
  • Generalization
  • Hyperparameter Optimization