Short Definition
Class imbalance occurs when some classes appear much more frequently than others.
Definition
Class imbalance refers to situations where the distribution of target classes is uneven, often with one class dominating the dataset. This is common in real-world problems such as fraud detection, medical diagnosis, and anomaly detection.
Imbalanced data can cause models to favor majority classes while neglecting minority classes.
Why It Matters
Standard evaluation metrics like accuracy can be misleading under class imbalance. A model may appear to perform well while failing to detect rare but important cases.
Addressing class imbalance is essential for fair and reliable model evaluation.
How It Works (Conceptually)
- The model sees more examples of majority classes
- Loss minimization favors frequent patterns
- Minority class errors contribute less to the objective
- The model may ignore rare classes entirely
Class imbalance skews learning incentives.
Minimal Python Example
Python
class_counts = count_labels(y)imbalance_ratio = max(class_counts) / min(class_counts)
Common Pitfalls
- Using accuracy as the primary metric
- Ignoring minority class performance
- Oversampling without proper validation
- Confusing imbalance with label noise
Related Concepts
- Precision
- Recall
- F1 Score
- Evaluation Metrics
- Sampling Bias