Short Definition
Compute–data trade-offs describe how model performance depends on the balance between computational resources and the amount of training data.
Definition
Compute–data trade-offs capture the principle that increasing model size or training compute alone is insufficient for optimal performance unless accompanied by sufficient data. For a given compute budget, there exists an optimal balance between model capacity and dataset size that maximizes learning efficiency.
More compute without data wastes capacity.
Why It Matters
Modern neural networks are trained under finite budgets. Understanding compute–data trade-offs helps:
- allocate resources efficiently
- avoid overfitting large models on small datasets
- plan data collection strategies
- predict diminishing returns from scaling
Scaling is constrained by balance.
Core Idea
For a fixed compute budget, performance improves when:
- models are not too small for the data
- data is not too scarce for the model
- compute is distributed optimally between parameters and samples
Balance beats extremes.
Minimal Conceptual Illustration
“`text
Performance ↑
│ ●
│ ●
│ ●
│●
└──────────────────→ Data (for fixed compute)
Compute-Limited vs Data-Limited Regimes
Compute-Limited
- insufficient compute to train large models
- under-optimized parameters
- performance improves with more compute
Data-Limited
- model capacity exceeds data diversity
- overfitting dominates
- performance improves with more data
Knowing the regime guides scaling.
Relationship to Architecture Scaling Laws
Scaling laws show that:
- larger models require more data to realize gains
- insufficient data flattens scaling curves
- compute-optimal scaling depends jointly on model size and dataset size
Architecture shapes the curve.
Model Size Implications
- small models saturate early even with large data
- large models underperform without sufficient data
- intermediate models often yield best compute efficiency
Bigger is not always better.
Data Quality vs Quantity
Trade-offs are influenced not just by data volume but by:
- label quality
- diversity
- noise levels
- distribution alignment
Bad data scales poorly.
Compute Allocation Strategies
Effective compute use includes:
- scaling batch size appropriately
- choosing optimal training duration
- early stopping in data-limited regimes
- curriculum or data filtering strategies
Compute must be spent wisely.
Impact on Generalization
Balanced compute–data regimes tend to:
- learn more robust features
- generalize better
- reduce memorization
- improve transferability
Generalization reflects balance.
Failure Modes
Ignoring compute–data trade-offs can lead to:
- wasted compute
- brittle overfitted models
- misleading benchmark gains
- inflated confidence under shift
Scaling blindly amplifies error.
Practical Implications
Compute–data trade-offs inform:
- experiment design
- infrastructure planning
- dataset investment decisions
- architecture choice
Strategy beats brute force.
Common Pitfalls
- scaling parameters without scaling data
- assuming compute alone guarantees progress
- ignoring data diversity
- extrapolating scaling laws outside observed regimes
- optimizing benchmarks instead of outcomes
Efficiency is contextual.
Summary Characteristics
| Aspect | Compute–Data Trade-offs |
|---|---|
| Nature | Empirical |
| Key variables | Compute, data, capacity |
| Optimal point | Exists for fixed budget |
| Risk if ignored | Inefficiency |
| Relevance | Foundational |
Related Concepts
- Architecture & Representation
- Architecture Scaling Laws
- Model Capacity
- Feature Learning
- Generalization
- Efficient Architectures
- Benchmark Performance vs Real-World Performance