Calibration

Short Definition

Calibration measures how well model confidence aligns with actual correctness.

Definition

Calibration describes the relationship between a model’s predicted confidence and the observed frequency of correct predictions. A model is well calibrated if, among all predictions made with confidence p, approximately p percent are correct.

Calibration is distinct from accuracy and focuses on probability reliability rather than classification performance.

Why It Matters

A highly accurate model can still be poorly calibrated. Poor calibration leads to overconfident or underconfident predictions, which can cause systematic decision errors.

Calibration is essential in applications where probabilities are directly used for decisions, ranking, or risk assessment.

How It Works (Conceptually)

  • Predictions are grouped by confidence level
  • Observed accuracy is compared to predicted confidence
  • Deviations indicate miscalibration
  • Calibration methods adjust probability outputs

Calibration evaluates trustworthiness, not correctness alone.

Minimal Python Example

Python
# conceptual check
if predicted_confidence observed_accuracy:
model_is_calibrated = True

Common Pitfalls

  • Confusing calibration with accuracy
  • Ignoring calibration during evaluation
  • Assuming softmax probabilities are calibrated
  • Evaluating calibration on training data

Related Concepts

  • Model Confidence
  • Reliability Diagrams
  • Expected Calibration Error (ECE)
  • Uncertainty Estimation
  • Evaluation Metrics