Short Definition
Hyperparameters are configuration values that control how a neural network is trained.
Definition
Hyperparameters are settings chosen before or outside the training process that influence model architecture, optimization behavior, and training dynamics. Unlike model parameters (weights and biases), hyperparameters are not learned from data but are set by the practitioner.
Examples include learning rate, batch size, number of layers, and regularization strength.
Why It Matters
Hyperparameters strongly affect:
- convergence speed
- training stability
- generalization performance
- resource usage
Poor hyperparameter choices can prevent learning entirely or lead to misleading results, even when the model and data are otherwise sound.
How It Works (Conceptually)
- Hyperparameters define the learning environment
- Training algorithms operate within these constraints
- Gradients update parameters, but hyperparameters shape how those updates behave
- Changing hyperparameters changes the trajectory of training
Hyperparameters indirectly control how a model explores the loss landscape.
Minimal Python Example
Python
learning_rate = 0.01batch_size = 32epochs = 50
Common Pitfalls
- Treating hyperparameters as fixed defaults
- Tuning hyperparameters on the test set
- Over-optimizing hyperparameters instead of fixing data issues
- Assuming more tuning always leads to better generalization
- Ignoring interactions between hyperparameters
Related Concepts
- Optimization
- Optimizers
- Learning Rate
- Batch Size
- Training Dynamics
- Generalization
- Hyperparameter Optimization