
The Bitter Lesson Was About Data, Not Compute
369
In a data-constrained era, the real lever isn’t more GPUs but better data and architectures that maximize each token’s value.
Research and analysis of neural scaling laws — the empirical relationships between model size, dataset size, compute budget, and performance (e.g., Chinchilla scaling, Kaplan laws, compute-optimal training).

In a data-constrained era, the real lever isn’t more GPUs but better data and architectures that maximize each token’s value.