As I have spent more time around all kinds of very impressive AI researchers at Stanford, I have noticed something peculiar about the way that these people react to breakthroughs in their own field. It always seems like there’s a voice in their head telling them to never get too excited. Machine learning models can do competitive coding? Ok. They can write proofs? That’s fun. When I was in high school, I used to hear about these milestones and become giddy with hope for the future. But as I became an insider, I noticed that the people who study AI, don’t really celebrate it all that much.
And I suppose you can’t expect anyone to be that excited about minor improvements or modifications to models that they spend their whole lives exploring and thinking about, but I have a sneaking suspicion that something else is going on.
Since the 1950s, humanity has pretty much oscillated between being 5 years away from artificial general intelligence (AGI), and being 100 years away from AGI. To quote Hemingway, progress in AI happens "gradually, then suddenly." There is some new technology that changes how we approach AI, we apply it to everything we can think of, and then we’re stuck for 10–20 years. Rinse and repeat. This has given way to two historical "AI Winters" in the later half of the 20th century, periods of time where little resources or attention went into AI.
Researchers quietly fear that, despite our immense progress in recent years, we will ultimately find ourselves in another winter. There is some reason to think this based on the type of progress that is being made. Most of new ideas that lead to powerful new models today are actually based on work done before deep learning became mainstream*. For example, reinforcement learning and convolutional neural networks, two very hot topics today, were developed in the 1960s and 1980s, respectively. These methods only became usable with the modern hardware, data, and backpropegation, invented in 1986. So it’s easy to see, from a research perspective, that it will be harder to produce new innovations than it has been in the past 15 years to revive old ideas with new resources.
But I have a different perspective, because I really am still that kid in high school whose heart starts racing at the thought of computers that can think like humans. I don’t actually think that we need to innovate so much further. We will be carried forward by the same macro trends that have brought us to this moment: more data, and cheaper, faster compute.
Pre-training models on unlabeled data gives models access to exponentially more information. And as the research matures, pre-training methods will give us more leverage with this information. As the internet spreads to the furthest corners of the world, data will continue to accumulate, and this trend will continue and accelerate. To retain information from all this data, we need a lot of parameters. To train a lot of parameters on a lot of data, we need to do a lot of matrix multiplication. Luckily, Moore’s law is arguably one of the most robust trends in recent history. Computation will continue to become cheaper, more accessible, and faster. More data, more compute, better models.
I don’t mean to suggest that we will have AGI tomorrow or in the next x years or anything like that. I only mean to say that there is reason to be truly excited about the achievements of machine learning models: we no longer have to rely on the wits of scientists and hobbyists to expect progress in machine learning, all we have to rely on is people using the internet and computers being produced. We have entered an invincible summer of AI, upper bounded by human-like general intelligence.
*The idea of attention in neural networks is a very important exception to this trend.
If you liked this article, please consider giving a clap and a follow!