Short Definition
Expected Calibration Error (ECE) measures how far predicted confidences deviate from actual accuracy.
Definition
Expected Calibration Error (ECE) is a scalar metric that quantifies model miscalibration by comparing predicted confidence to observed accuracy across confidence bins. Predictions are grouped into bins based on confidence, and the weighted average difference between confidence and accuracy is computed.
ECE summarizes calibration quality into a single number.
Why It Matters
A model can be highly accurate yet poorly calibrated. ECE reveals whether predicted probabilities can be trusted as probabilities.
ECE is widely used to evaluate confidence reliability in classification models, especially when probabilities are used for decision-making or ranking.
How It Works (Conceptually)
- Predictions are divided into confidence bins
- For each bin, average confidence and accuracy are computed
- The absolute difference is calculated per bin
- Differences are averaged, weighted by bin size
Lower ECE indicates better calibration.
Mathematical Definition
ECE = Σ (|B_i| / N) × |acc(B_i) − conf(B_i)|
Where:
- B_i is a confidence bin
- acc(B_i) is the accuracy in bin i
- conf(B_i) is the average confidence in bin i
- N is the total number of samples
Minimal Python Example
ece += (bin_size / total_samples) * abs(bin_accuracy - bin_confidence)
Common Pitfalls
- Treating ECE as a performance metric
- Ignoring sensitivity to bin count and binning strategy
- Comparing ECE values across different datasets
- Using ECE without visual diagnostics
Related Concepts
- Calibration
- Reliability Diagrams
- Model Confidence
- Evaluation Metrics
- Uncertainty Estimation