Robustness Metrics
Short Definition
Robustness metrics quantify how resistant a model is to adversarial or worst-case perturbations.
Definition
Robustness metrics are evaluation measures designed to assess a model’s performance under adversarial conditions, corrupted inputs, or constrained worst-case scenarios. Unlike standard metrics computed on clean test data, robustness metrics explicitly account for intentional perturbations or stress conditions.
They operationalize the concept of robustness by making failure measurable.
Why It Matters
High accuracy on clean data does not imply robustness. Robustness metrics reveal whether a model maintains acceptable performance when inputs are adversarially manipulated or slightly altered in harmful ways.
They are essential for:
- comparing defensive techniques
- validating robustness claims
- deploying models in security- or safety-sensitive settings
- avoiding misleading evaluations
Without robustness metrics, robustness remains anecdotal.
What Robustness Metrics Measure
Robustness metrics typically evaluate one or more of the following:
- performance under adversarial attacks
- sensitivity to input perturbations
- worst-case error within a perturbation budget
- degradation relative to clean performance
Each metric defines robustness relative to a specific threat model.
Common Types of Robustness Metrics
- Robust Accuracy: accuracy measured on adversarially perturbed inputs
- Worst-Case Loss: maximum loss within a bounded perturbation set
- Certified Robustness Bounds: guarantees that predictions remain unchanged within a defined region
- Attack Success Rate: fraction of inputs successfully misclassified by an attack
- Perturbation Sensitivity: minimum perturbation required to change a prediction
Different metrics emphasize different failure modes.
How Robustness Metrics Work (Conceptually)
- Define a threat model (attack type, budget, assumptions)
- Generate adversarial or perturbed inputs
- Evaluate model behavior under those conditions
- Aggregate results into a robustness score
Robustness metrics are only meaningful when the threat model is explicit.
Minimal Conceptual Example
robust_accuracy = correct_predictions_under_attack / total_samples
Common Pitfalls
- Reporting robustness without specifying the attack or budget
- Comparing robustness metrics across incompatible threat models
- Assuming robustness to one attack implies general robustness
- Treating robustness as a single scalar property
Robustness is always relative, not absolute.
Relationship to Other Metrics
Robustness metrics complement—but do not replace—standard evaluation metrics. A model can be accurate but brittle, or robust but less accurate on clean data.
Trade-offs between accuracy and robustness are common and must be evaluated explicitly.
Related Concepts
- Model Robustness
- Adversarial Attacks (Overview)
- Adversarial Examples
- White-Box Attacks
- Black-Box Attacks
- Transferability of Adversarial Examples
- Generalization