Short Definition
The F1 score balances precision and recall into a single metric.
Definition
The F1 score is the harmonic mean of precision and recall. It provides a single value that reflects both the correctness of positive predictions and the model’s ability to find all positive cases.
The F1 score is commonly used when precision and recall are both important and when class imbalance is present.
Why It Matters
Accuracy can be misleading on imbalanced datasets. The F1 score offers a more informative measure by penalizing extreme imbalances between precision and recall.
A high F1 score indicates that a model achieves a good balance between false positives and false negatives.
How It Works (Conceptually)
- Precision measures prediction correctness
- Recall measures coverage of positives
- The harmonic mean penalizes extreme values
- Both metrics must be reasonably high
The F1 score discourages models that optimize only one metric.
Mathematical Definition
F1 = 2 × (Precision × Recall) / (Precision + Recall)
Minimal Python Example
f1 = 2 * (precision * recall) / (precision + recall)
Common Pitfalls
- Using F1 without understanding precision and recall
- Assuming F1 is universally optimal
- Ignoring task-specific cost asymmetry
- Comparing F1 scores across different datasets
Related Concepts
- Precision
- Recall
- Evaluation Metrics
- Confusion Matrix
- Class Imbalance
- Decision Thresholding