Short Definition
Quantization reduces numerical precision of model parameters.
Definition
Quantization converts high-precision values (such as 32-bit floats) into lower-precision representations. This reduces memory usage and improves inference speed, especially on constrained hardware.
Quantization is commonly applied after training, but can also be integrated during training.
Why It Matters
Quantization enables deployment on edge devices and accelerators.
How It Works (Conceptually)
- Map continuous values to discrete levels
- Trade precision for efficiency
- Apply scaling to preserve accuracy
Minimal Python Example
Python
quantized = int(weight * scale)
Common Pitfalls
- Quantizing too early
- Ignoring accuracy degradation
- Using unsupported hardware
Related Concepts
- Model Compression
- Deployment
- Efficiency