Neural Network Lexicon

Cross-Validation

Short Definition

Cross-validation evaluates model performance by training and testing on multiple data splits.

Definition

Cross-validation is a resampling technique used to assess how well a model generalizes to unseen data. The dataset is divided into multiple subsets (folds), and the model is trained and evaluated repeatedly so that each subset is used for validation at least once.

Cross-validation provides a more robust estimate of performance than a single train/test split.

Why It Matters

Performance estimates from a single split can be noisy or misleading. Cross-validation:

reduces dependence on a particular data split
provides more stable performance estimates
helps compare models and hyperparameters fairly

It is especially useful when datasets are small.

How It Works (Conceptually)

Split the dataset into k folds
Train the model on k−1 folds
Validate on the remaining fold
Repeat until all folds are used
Aggregate results across runs

Cross-validation estimates expected generalization performance.

Minimal Python Example

			
for fold in folds:
  train(data_except(fold))
  evaluate(fold)

Common Pitfalls

Using cross-validation on time-series data without care
Data leakage across folds
Treating cross-validation results as exact
Excessive computational cost for large models

Related Concepts

Generalization
Evaluation Metrics
Train/Test Split
Hyperparameter Optimization
Data Leakage