Short Definition
Reliability diagrams visualize model calibration across confidence levels.
Definition
A reliability diagram plots predicted confidence against observed accuracy by grouping predictions into confidence bins. Each bin compares the average confidence of predictions to the fraction of correct predictions in that bin.
Reliability diagrams provide a visual diagnostic of calibration quality.
Why It Matters
Numerical metrics alone can hide calibration patterns. Reliability diagrams reveal where a model is overconfident or underconfident and how miscalibration varies across confidence ranges.
They are one of the most intuitive tools for understanding probability reliability.
How It Works (Conceptually)
- Predictions are divided into confidence bins
- Accuracy is computed for each bin
- Confidence is plotted against accuracy
- Deviations from the diagonal indicate miscalibration
Perfect calibration lies along the diagonal line.
Minimal Python Example
Python
plot(confidence_bins, accuracy_per_bin)
Common Pitfalls
- Using too few or too many bins
- Ignoring sample size per bin
- Overinterpreting noisy bins
- Treating diagrams as quantitative metrics
Related Concepts
- Calibration
- Model Confidence
- Expected Calibration Error (ECE)
- Evaluation Metrics
- Uncertainty Estimation