Opinion

A New AI Trend: Chinchilla (70B) Greatly Outperforms GPT-3 (175B) and Gopher (280B)

DeepMind has found the secret to cheaply scale large language models.

Alberto Romero
Towards Data Science
11 min readApr 11, 2022

Photo by ArtHead on Shutterstock

DeepMind’s latest paper dismantles the tired trend of building larger and larger models to improve performance.

The company has found a key aspect of scaling large language models that no one has ever applied before. OpenAI, Google, Microsoft, Nvidia, Facebook, and even DeepMind themselves, all big tech companies committed to creating powerful language models, are doing it wrong: Making models larger is neither the best nor the most efficient approach.

Increasing model size as a proxy for increasing performance was established in 2020 by Kaplan and others at OpenAI. They found a power law between those variables and concluded that, as more budget is available to train models, the majority should be allocated to making them bigger.

That’s why we’ve seen ever-larger models being released every few months since 2020: GPT-3 (175B), LaMDA (137B), Jurassic-1 (178B), Megatron-Turing NLG (530B), Gopher (280B) — and that’s just the dense models. As predicted by Kaplan’s law, these models are significantly better than the previous generation (GPT-2, BERT), just not as good as they…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Alberto Romero

AI & Tech | Weekly AI Newsletter: https://thealgorithmicbridge.substack.com/ | Contact: alber.romgar at gmail dot com

Responses (5)

--

--

--