Opinion

A New AI Trend: Chinchilla (70B) Greatly Outperforms GPT-3 (175B) and Gopher (280B)

DeepMind has found the secret to cheaply scale large language models.

Published in

Towards Data Science

11 min readApr 11, 2022

Photo by ArtHead on Shutterstock

DeepMind’s latest paper dismantles the tired trend of building larger and larger models to improve performance.

The company has found a key aspect of scaling large language models that no one has ever applied before. OpenAI, Google, Microsoft, Nvidia, Facebook, and even DeepMind themselves, all big tech companies committed to creating powerful language models, are doing it wrong: Making models larger is neither the best nor the most efficient approach.

Increasing model size as a proxy for increasing performance was established in 2020 by Kaplan and others at OpenAI. They found a power law between those variables and concluded that, as more budget is available to train models, the majority should be allocated to making them bigger.

That’s why we’ve seen ever-larger models being released every few months since 2020: GPT-3 (175B), LaMDA (137B), Jurassic-1 (178B), Megatron-Turing NLG (530B), Gopher (280B) — and that’s just the dense models. As predicted by Kaplan’s law, these models are significantly better than the previous generation (GPT-2, BERT), just not as good as they…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Sign up with Google

Sign up with Facebook

Sign up with email

Already have an account? Sign in

Published in Towards Data Science

Last published just now

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Alberto Romero

AI & Tech | Weekly AI Newsletter: https://thealgorithmicbridge.substack.com/ | Contact: alber.romgar at gmail dot com

Responses (5)
What are your thoughts?
Also publish to my profile
Elliott Bignell
over 2 years ago
Interesting as usual. A couple of thoughts spring to mind, also as usual.
Firstly, there is still a feeling of brute force about these findings. Within a fixed budget we are basically seeing something like a zero-sum trade-off between massive size…...
--
derek neach
over 2 years ago
Sure all of these models sound great. But what is the sense of developing the models if the public is never going to be able to use it? That doesn’t make much sense to me.
--
Dr. Ashish Bamania
he/him
almost 2 years ago
Interesting! Thanks for this :)
--

More from Alberto Romero and Towards Data Science

OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI

Alberto Romero

OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI

Incredible, a miracle, more than just a better state-of-the-art AI model

Dec 22, 2024

By implementing projects directly, you learn twice as much.

In

Towards Data Science

by

Sarah Lea

5 Simple Projects to Start Today: A Learning Roadmap for Data Engineering

Start with 5 practical projects to lay the foundation for your data engineering roadmap.

5d ago

Deep Learning for Outlier Detection on Tabular and Image Data

In

Towards Data Science

by

W Brett Kennedy

Deep Learning for Outlier Detection on Tabular and Image Data

The challenges and promises of deep learning for outlier detection, including self-supervised learning techniques

4d ago

How Meta Plans To Crank the Dopamine Machine With Infinite AI-Generated Content

Alberto Romero

How Meta Plans To Crank the Dopamine Machine With Infinite AI-Generated Content

Breaking: tech companies have figured out a new way to upend humanity

3d ago

See all from Alberto Romero

See all from Towards Data Science

Recommended from Medium

How Jupyter Agent Blew My Mind. The AI Revolution You Didn’t See Coming.

In

AI Advances

by

Savvas Theocharous

How Jupyter Agent Blew My Mind. The AI Revolution You Didn’t See Coming.

Easily accessible, but hard to believe. Once you get your hands on it you will see what I am talking about!

Dec 31, 2024

Meta’s Large Concept Models (LCMs) Are Here To Challenge And Redefine LLMs

In

Level Up Coding

by

Dr. Ashish Bamania

Meta’s Large Concept Models (LCMs) Are Here To Challenge And Redefine LLMs

A deep dive into ‘Large Concept Model’, a novel language processing architecture and evaluating its performance against popular LLMs

6d ago

Lists

AI Regulation

6 stories668 saves

ChatGPT

21 stories937 saves

ChatGPT prompts

51 stories2434 saves

Generative AI Recommended Reading

52 stories1588 saves

The Top 10 AI Research Papers of 2024: Key Takeaways and How You Can Apply Them

In

Towards AI

by

Prashant Kalepu

The Top 10 AI Research Papers of 2024: Key Takeaways and How You Can Apply Them

As the curtains draw on 2024, it’s time to reflect on the innovations that have defined the year in AI. And let’s be real — what a year it…

Dec 16, 2024

Illustration of a boy fighting a magic battle

In

The Generator

by

Linda Caroll

I Don’t Know How To Make You Care What ChatGPT Is Quietly Doing

Over half of the internet is now AI generated text

5d ago

Your Company Needs Small Language Models

In

Towards Data Science

by

Sergei Savvov

Your Company Needs Small Language Models

When specialized models outperform general-purpose models

Dec 26, 2024

How Meta Plans To Crank the Dopamine Machine With Infinite AI-Generated Content

Alberto Romero

How Meta Plans To Crank the Dopamine Machine With Infinite AI-Generated Content

Breaking: tech companies have figured out a new way to upend humanity

3d ago

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams