Opinion
A New AI Trend: Chinchilla (70B) Greatly Outperforms GPT-3 (175B) and Gopher (280B)
DeepMind has found the secret to cheaply scale large language models.
DeepMind’s latest paper dismantles the tired trend of building larger and larger models to improve performance.
The company has found a key aspect of scaling large language models that no one has ever applied before. OpenAI, Google, Microsoft, Nvidia, Facebook, and even DeepMind themselves, all big tech companies committed to creating powerful language models, are doing it wrong: Making models larger is neither the best nor the most efficient approach.
Increasing model size as a proxy for increasing performance was established in 2020 by Kaplan and others at OpenAI. They found a power law between those variables and concluded that, as more budget is available to train models, the majority should be allocated to making them bigger.
That’s why we’ve seen ever-larger models being released every few months since 2020: GPT-3 (175B), LaMDA (137B), Jurassic-1 (178B), Megatron-Turing NLG (530B), Gopher (280B) — and that’s just the dense models. As predicted by Kaplan’s law, these models are significantly better than the previous generation (GPT-2, BERT), just not as good as they…