Neural Network Lexicon

Model Interpretability

Short Definition

Model interpretability explains why a model makes predictions.

Definition

Model interpretability focuses on understanding how inputs influence outputs. Rather than treating models as black boxes, interpretability techniques aim to reveal feature importance, decision paths, or sensitivity.

This is critical in high-stakes or regulated applications.

Why It Matters

Interpretable models build trust and enable debugging.

How It Works (Conceptually)

Analyze weights and activations
Attribute importance to inputs
Examine decision behavior

Minimal Python Example

importance = abs(weight)

Common Pitfalls

Treating explanations as exact truths
Ignoring uncertainty
Applying interpretability only after failure

Related Concepts

Feature Attribution
Sparse Networks
Debugging Models