Short Definition
Targeted attacks are adversarial attacks designed to force a model to predict a specific incorrect output.
Definition
Targeted attacks are a class of adversarial attacks in which the attacker’s goal is not merely to cause an incorrect prediction, but to steer the model toward a particular, chosen target class or output. The attacker explicitly specifies the desired misclassification.
These attacks require more control than untargeted attacks and therefore represent a stronger and more deliberate form of adversarial manipulation.
Why It Matters
Targeted attacks demonstrate that models can be manipulated in precise and predictable ways. This has serious implications for systems where specific outcomes carry meaning, such as:
- identity recognition
- authorization systems
- content moderation
- medical or legal classification
A model vulnerable to targeted attacks may be coerced into producing strategically chosen errors.
How Targeted Attacks Work (Conceptually)
- The attacker selects a target class different from the true label
- An objective is defined to increase the model’s confidence in that target
- Perturbations are optimized to move the input toward the target decision region
- The attack succeeds when the model predicts the chosen target
The attack objective is directional, not merely disruptive.
Targeted vs Untargeted Attacks
- Targeted attacks: aim for a specific wrong outcome
- Untargeted attacks: aim for any wrong outcome
Targeted attacks are generally harder to execute but more revealing of model vulnerabilities.
Minimal Conceptual Example
# conceptual targeted objectiveminimize loss(model(input + perturbation), target_class)
This objective explicitly pushes the prediction toward a chosen target.
Common Characteristics
- Require defining a target output
- Often rely on gradient-based optimization
- More constrained than untargeted attacks
- Can produce semantically meaningful misclassifications
Targeted attacks probe fine-grained control over decision boundaries.
Common Pitfalls
- Assuming targeted attacks are unrealistic or impractical
- Evaluating robustness only against untargeted attacks
- Ignoring high-confidence targeted failures
- Treating targeted robustness as binary
Targeted robustness is relative to the attacker’s goals.
Related Concepts
- Model Robustness
- Adversarial Attacks (Overview)
- Untargeted Attacks
- White-Box Attacks
- Black-Box Attacks
- Adversarial Examples