Robustness Metrics

Robustness Metrics

Short Definition

Robustness metrics quantify how resistant a model is to adversarial or worst-case perturbations.

Definition

Robustness metrics are evaluation measures designed to assess a model’s performance under adversarial conditions, corrupted inputs, or constrained worst-case scenarios. Unlike standard metrics computed on clean test data, robustness metrics explicitly account for intentional perturbations or stress conditions.

They operationalize the concept of robustness by making failure measurable.

Why It Matters

High accuracy on clean data does not imply robustness. Robustness metrics reveal whether a model maintains acceptable performance when inputs are adversarially manipulated or slightly altered in harmful ways.

They are essential for:

  • comparing defensive techniques
  • validating robustness claims
  • deploying models in security- or safety-sensitive settings
  • avoiding misleading evaluations

Without robustness metrics, robustness remains anecdotal.

What Robustness Metrics Measure

Robustness metrics typically evaluate one or more of the following:

  • performance under adversarial attacks
  • sensitivity to input perturbations
  • worst-case error within a perturbation budget
  • degradation relative to clean performance

Each metric defines robustness relative to a specific threat model.

Common Types of Robustness Metrics

  • Robust Accuracy: accuracy measured on adversarially perturbed inputs
  • Worst-Case Loss: maximum loss within a bounded perturbation set
  • Certified Robustness Bounds: guarantees that predictions remain unchanged within a defined region
  • Attack Success Rate: fraction of inputs successfully misclassified by an attack
  • Perturbation Sensitivity: minimum perturbation required to change a prediction

Different metrics emphasize different failure modes.

How Robustness Metrics Work (Conceptually)

  • Define a threat model (attack type, budget, assumptions)
  • Generate adversarial or perturbed inputs
  • Evaluate model behavior under those conditions
  • Aggregate results into a robustness score

Robustness metrics are only meaningful when the threat model is explicit.

Minimal Conceptual Example

robust_accuracy = correct_predictions_under_attack / total_samples

Common Pitfalls

  • Reporting robustness without specifying the attack or budget
  • Comparing robustness metrics across incompatible threat models
  • Assuming robustness to one attack implies general robustness
  • Treating robustness as a single scalar property

Robustness is always relative, not absolute.

Relationship to Other Metrics

Robustness metrics complement—but do not replace—standard evaluation metrics. A model can be accurate but brittle, or robust but less accurate on clean data.

Trade-offs between accuracy and robustness are common and must be evaluated explicitly.

Related Concepts

  • Model Robustness
  • Adversarial Attacks (Overview)
  • Adversarial Examples
  • White-Box Attacks
  • Black-Box Attacks
  • Transferability of Adversarial Examples
  • Generalization