The most advanced generalist network to date
The deep learning field is progressing rapidly, and the latest work from Deepmind is a good example of this. Their Gato model is able to learn to play Atari games, generate realistic text, process images, control robotic arms, and more, all with the same neural network. Inspired by large-scale language models, Deepmind applied a similar approach but extended beyond the realm of text outputs.

How Gato works
This new AGI (after Artificial General Intelligence) works as a multi-modal, multi-task, multi-embodiment network, which means that the same network (i.e. a single architecture with a single set of weights) can perform all tasks, despite involving inherently different kinds of inputs and outputs.
While Deepmind’s preprint presenting Gato is not very detailed, it is clear enough in that it is strongly rooted in transformers as used for natural language processing and text generation. However, it is not only trained with text but also with images (already around with models like Dall.E), torques acting on robotic arms, button presses from computer game playing, etc. Essentially, then, Gato handles all kinds of inputs together and decides from context whether to output intelligible text (for example to chat, summarize or translate text, etc.), or torque powers (for the actuators of a robotic arm), or button presses (to play games), etc.
Gato thus demonstrates the versatility of transformer-based architectures for machine learning, and shows how they can be adapted to a variety of tasks. We have seen in the last decade surprising applications of neural networks specialized for playing games, translating text, captioning images, etc. But Gato is general enough to perform all these tasks by itself, using a single set of weights and a relatively simple architecture. This is in opposition to specialized networks that require multiple modules to be integrated in order to work together, whose integration depends on the problem to be solved.
Moreover, and impressively, Gato is not even close to the largest neural networks we have seen! With "only" 1.2 billion weights, it’s comparable to OpenAI’s GPT-2 language model, i.e. over 2 orders of magnitude smaller than GPT-3 (with 175 billion weights) and other modern language processing networks.
The results on Gato also support previous findings that training from data of different nature results in better learning of the information that is supplied. Just like humans learn their worlds from multiple simultaneous sources of information! This whole idea enters fully into one of the most interesting trends in the field of machine learning in recent years: multimodality -the capacity of handling and integrating various types of data.
On the potential of AGIs -towards true AI?
I never really liked the term Artificial Intelligence. I used to think that just nothing could beat the human brain. However…
The potential behind emerging AGIs is much more interesting, and certainly powerful, than what we had just one year ago. These models are able to solve a variety of complex tasks with essentially a single piece of software, making them very versatile. If one such model advanced by say a decade from now, were to be run inside robot-like hardware with means for locomotion and with appropriate input and output peripherals, we could well be giving solid steps into creating true artificial beings with real Artificial Intelligence. After all, our brains are somehow very intricate neural networks connecting and integrating sensory information to output our actions. Nihilistically, nothing prevents this data processing to happen in silico rather than organically.
Just 3 years ago, I absolutely wouldn’t have said any of this, especially not that AI could someday be real. Now, I’m not so sure, and the community sentiment is similar: they now approximate that we could have machine-based systems with the same general-purpose reasoning and problem-solving tasks of humans by 2030. The projected year was around 2200 just 2 years ago, and has been slowly decreasing:
When will the first weakly general AI system be devised, tested, and publicly known of?
Although that’s just blind predictions with no solid modeling behind them, the trend does reflect the giant steps that the field is taking. I now don’t see it far-fetched that a single robot could play chess with you one day and scrabble the next, water your plants when you aren’t home even making its own decisions depending on weather forecasts and how your plants look, intelligently summarize the news for you, cook your meals, and why not even help you to develop your ideas. Generalist AI could get here sooner than we think.
Key reads
Deepmind’s preprint on Gato at arXiv:
An at Deepmind’s website:
About multimodality in Machine Learning:
Some of my articles on using GPT-3 and VQGAN-CLIP, with which I’ve experimented a lot with a focus on development for the web:
How this "artificial dreaming" program works, and how you can create your own artwork with it
Web-Based Chatbot Project, Module 2: GPT-3-generated responses assisted with a database for…
Devising tests to measure GPT-3’s knowledge of the basic sciences
GPT-3-like models with extended training could be the future 24/7 tutors for biology students
www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here.