The world’s leading publication for data science, AI, and ML professionals.

5 Must-Know AI Concepts In 2021

Here's what you don't want to miss.

ARTIFICIAL INTELLIGENCE | EXPLANATIONS

Photo by Douglas Sanchez on Unsplash
Photo by Douglas Sanchez on Unsplash

Should AI mimic human intelligence by copying our biology? Or is our psychobiological nature irrelevant to AI in the same way bird biology is irrelevant to aerospace engineering?

That’s a question people in the field have been pondering since its conception. We want to build intelligent systems and we humans are arguably the only truly intelligent species. Isn’t it logical to look into us for inspiration? However, because the building blocks of AI are so different from biological elementary pieces, shouldn’t we forget about humans and follow the path our research leads us to?

No one knows what the future of AI will hold. What we know is that nowadays deep learning is getting closer to human-like cognition. Maybe humans aren’t so special in terms of intelligence, but evolution gave us some unique features we better take into account when creating AI systems. We’ve evolved for millennia in this environment, adapting slowly to the unchanging laws of nature. Why not bypass this process by simulating our polished mechanisms?

In this article, I’ll talk about five examples that are currently at the forefront of AI research. Each is at least loosely based on some aspect of human cognitive functions. These concepts will be central in the following years, so let’s keep an eye on them.


The Transformer – Human attention

It wasn’t long ago when recurrence-based architectures dominated natural language processing (NLP). If you faced an NLP problem – translation, speech to text, generative tasks – you either used gated recurrent units (GRU) or long short-term memory (LSTM). These two architectures were designed to handle sequential input data. For instance, the system could take an English sentence and process each successive word onto a Spanish translation.

One of the main drawbacks of these models was the vanishing gradient problem. Because the information was processed sequentially, when the system was about to output the first French word, the first English word was merely remembered. To solve this defect, researchers introduced attention mechanisms in 2014. By mimicking cognitive attention, neural networks could weigh the influence of context. No more information loss.

In 2017, Google’s AI team published the seminal paper Attention Is All You Need. It said: Attention mechanisms are powerful enough to solve language tasks. We don’t need recurrence and we don’t need sequential processing. They had invented the famous transformer architecture. The way transformers have influenced the deep learning landscape is only comparable to the disruption of CNNs in computer vision (CV) in 2012 when Hinton’s team won the ImageNet challenge.

Transformers work by processing in parallel all the words (tokens) of a sentence and learning the contextual relationships between them. In contrast to LSTM, transformers don’t process data sequentially. Training times are much shorter. Transformers are the go-to architecture for any NLP task nowadays. Even CV scientists have started to apply transformers to image and video problems. Not even convolution will survive attention.

From 2017 to 2021 researchers have further developed the transformer, aiming at solving various shortcomings and improving performance. The transformer-XL is larger and allows the system to learn dependency in larger contexts. GPT-3 – which was built on the original transformer architecture – can’t look past its context window, which makes it memoryless. The Reformer tackles the prohibitive costs of training. It improves efficiency and reduces training times while achieving state-of-the-art performance.

Some of the most notable applications of the Transformer in recent years are multitasking AIs such as Google’s BERT, OpenAI’s GPT family – among which GPT-3 is the undisputed star -, or Wu Dao 2.0, which holds the record of largest neural network. The Transformer is also the core algorithm behind the new generation of chatbots – Meena, BlenderBot 2.0, or LaMDA. And it has even set foot in the world of biology. A few days ago DeepMind announced they had released the code and database of AlphaFold 2. A model that could help reach a deeper understanding of the workings of protein folding.


Self-supervised training – Human learning

Supervised deep learning systems have dominated the AI landscape since 2012. These systems learn from labeled data to classify new instances into the learned classes. We put a lot of resources into classifying the training examples to facilitate learning. However, these pattern-matching systems learn nothing like us.

Reinforcement learning better resembles the way we learn. These systems live in a constrained virtual world in which they can do a limited set of actions to achieve a reward. DeepMind researchers published a few months ago a paper arguing that "reward is enough" to achieve general Artificial Intelligence. However, not everything people do is meant to optimize a reward in the same sense reinforced AI does. That’s not to mention the complexity of our world, the number of possible actions available each instant, or the complexity and nuance of what we want or need.

For the above reasons, researchers have recently put more interest in the paradigm of unsupervised – or self-supervised, as Yann LeCun likes to call it – learning. He argues we learn similarly to these systems (at least in comparison with the other paradigms). Humans learn a lot by observing and perceiving the world. That’s what self-supervised learning is about.

"[Self-supervised learning] is the idea of learning to represent the world before learning a task. __ This is what babies and animals do. […] Once we have good representations of the world, learning a task requires few trials and few samples."

  • Supervised learning systems learn to find patterns in data without caring about the world.
  • Reinforcement learning systems learn to optimize rewards without caring about the world.
  • Self-supervised learning systems need to represent the world to understand how things relate to one another.

These systems can learn hidden parts of inputs from the visible parts of those inputs. For instance, if you were to feed half a sentence to a self-supervised system, it could predict the missing words. To do that, they need a deeper knowledge of the relationships between things (that’s not to say they understand the world in the same sense we do, which isn’t the case).

The need for huge amounts of labeled data (supervised learning) and uncountable simulations (reinforcement learning) are a hindrance. Self-supervised learning aims at solving both problems. These systems learn without explicitly telling them what they have to learn. No classes. No tasks.

Some important successes of self-supervised learning are related to the transformer architecture. For instance, BERT or GPT-3 have proved extremely successful in language generation tasks. Self-supervised systems are now state-of-the-art in many NLP domains. A notable drawback of these systems is their inability to handle continuous input such as images or audio.

"The next revolution in AI will not be supervised, nor purely reinforced."

  • Yann LeCun

Prompt programming – Human communication

Low-code and no-code initiatives appeared a few decades ago as a reaction to the increasingly large skill gap in the coding world. The technical ability to create good code and know how to handle tasks at different points in the design-production pipeline was expensive. As software products got more complex, so did the programming languages. No-code aims at solving this gap for non-technical business people. It’s an approach that bypasses coding to make the results accessible to anyone.

Knowing how to code is arguably as important as speaking English was a few years ago. You either knew or you were missing a lot. Job opportunities, books and articles, papers, and other technical work… In the Future, the percentage of smart houses – domotics – will increase. Technical software skills may be as important then as now is it knowing how to fix a pipe or a broken light.

At the intersection of no-code initiatives and the future of AI, we have prompt programming. GPT-3 is the best-known AI system that uses prompts. OpenAI released the API last year, and people soon recognized the uniqueness of prompting. It was something different; neither talking to a human nor programming in the formal sense. Prompt programming, as Gwern calls it, can be understood as a new form of programming. It isn’t as superficial as no-code, because we communicate with the system – we program it – in natural language. And it isn’t as highly technical as programming in C or Python.

GPT-3 caught the attention of researchers and developers, and many were motivated to find its shortcomings. Some found that GPT-3 failed where it should have succeeded. However, Gwern proved them wrong. He argued we should approach GPT-3 as if we were programming it in English. We have to do it right, not everything goes. He repeated the tests tweaking the prompts and succeeded in teaching GPT-3 to do the tasks correctly. He said:

"[Prompting] is a rather different way of using a DL [deep learning] model, and it’s better to think of it as a new kind of programming, where the prompt is now a "program" which programs GPT-3 to do new things."

GPT-3 sparked the possibility of programming a system by writing in English. The system could understand our intentions and translate them to the computer in a way it could interpret them without uncertainty.

A month ago, Microsoft – who partnered with OpenAI last year – and GitHub released GitHub Copilot. The system, fueled by a descendent of GPT-3 called Codex, was created to be a powerful code autocomplete. Microsoft saw the potential of GPT-3 in creating code and how it could understand English and transform it into well-written, functional programs. Copilot can, among other things, read a comment that describes a function in English, interpret it, and write down the function.

GPT-3 and GitHub Copilot combine the promises of no-code and the potential of prompt programming into a new era that will allow non-technical people access to the world of coding.

The main advantage of prompt programming and the reason why it’ll be successful is that we humans have evolved to communicate in natural language, not in formal languages. English has a series of rules that we intuitively know. We learn to speak correctly way before we understand the rules we’re using. We don’t invent the rules and then stick to them. We discover the rules we’re already following.

Writing Python or C is different. We call them languages but they are distinct from English in significant ways. Computers need unambiguous, uninterpretable commands to know what to do. Programming languages have strict syntax rules that can’t be broken or the program won’t run. There aren’t shortcuts to this. Without prompt programming, if you want to communicate with a computer, you have to learn its language. Even high-level languages such as Python require a notable degree of technical expertise that most people don’t have.

Prompt programming is the future of coding: We’ll be able to program most things in natural language. There will be intermediate systems tackling the translation between our inexact, nuanced, and context-filled thoughts and the formal set of instructions computers need to work.


Multimodality – Human perception

Until very recently, deep learning systems were designed to tackle unimodal problems. If you wanted to achieve state-of-the-art performance in machine translation, you trained your system with English-Spanish pairs of text data. If you wanted to beat the ImageNet challenge, your system had to be the best at object recognition, and nothing else. NLP systems and CV systems were distinct and unmixable.

Now, taking inspiration from neuroscience in a quest to simulate our perceptual mechanisms, researchers are focused on creating AI systems that learn from different types of data. Instead of dividing the systems by their area of expertise, why not make them combine data from visual and language sources? There’s info in text. There’s info in images. But there’s also info at the intersection of both. This new trend of multimodal systems is what Google and the BAAI did this year with MUM and Wu Dao 2.0, respectively. It’s a step forward in trying to make artificial systems resemble the human brain.

We’ve evolved in a multimodal world. Events and objects around us produce different kinds of information: electromagnetic, mechanical, chemical… For instance, an apple has color, form, texture, taste, smell… That’s why our brain is multisensory. We have a set of perceptual systems that capture part of the multimodal nature of the world (other living forms have different perceptual systems that allow them to perceive modes we’re biologically unaware of). What’s more interesting is that the brain integrates info from the perceptual channels in a single representation of reality.

That’s where we can find utility from imbuing AI with this capability. If giving a pair text-image to a model allows it to represent the world more accurately, it could be more precise in its predictions or actions and adapt better to the environment. That’s today’s definition of intelligence: "The ability to understand and adapt to the environment by using inherited abilities and learned knowledge."

A robot with the artificial equivalent of eyes, ears, and hands and GPT-3 as the brain would be much more powerful than any current AI. The brain is where all the processing occurs, but the data that’s processed matters as well. Future AI systems will have sensors, controllers, and actuators, interconnected in such a way that information processing is fast, accurate, and abundant.

The focus is still on software-centered virtual systems, but some research groups have accomplished successful integration of text and image data. How these networks should combine both types of information remains a mystery (it’s not perfectly understood in humans, either), but the attempts have been successful for now. DALL·E, CLIP, MUM, UC², and Wu Dao 2.0 are living proof that it’s possible.


Multitasking and task transfer – Human versatility

Supervised and reinforced AI systems are bad multitaskers. Even systems like AlphaZero, which are designed to learn different tasks, have to unlearn and relearn for each task. However, self-supervised systems are inherently better at this. The reason is they’re trained in a task-agnostic manner. Because these systems aren’t explicitly told what to learn from the input data, they can be applied to different tasks without the need to change the parameters. That’s the case with GPT-3.

One of the most potent features of GPT-3 is its capacity to handle different tasks with the same set of weights. The system doesn’t change internally to do machine translation, question answering, or generating creative fiction. The system was trained in an unsupervised way from most of the internet text data. But it didn’t know how it’d use what it had learned. With the help of prompt programming, a user could condition GPT-3 to solve a given task. For the record, GPT-3 achieved state-of-the-art in several tasks for which it was not trained. That’s the power of multitasking and task transfer.

Multitasking systems can apply the same input to different tasks. For instance, if I feed the word ‘cat’ to the system, I could ask it to find the Spanish translation ‘gato’, I could ask it to show me the image of a cat, or I could ask it to write an essay about why cats are so weird. Different tasks for the same input.

This idea is often combined with few-shot learning. Supervised deep learning systems are trained and tested on a pre-selected set of classes. If a CV system has learned to classify car, plane, and ship images, it will only do good when tested on those three classes. In few-shot (or zero-shot/one-shot) learning settings the system is tested against new classes – without weight updating.

One example would be to show three images of a bike to the system at test time and then ask it to classify car, plane, ship, and bike images normally. That’s few-shot because we’ve shown at test time 3 examples of what a bike is. A system that has learned how to learn (such as GPT-3) should be able to perform well in these extreme situations. GPT-3 proved that it’s possible. And its performance has nothing to envy from supervised systems.

If we combine multitasking and few-shot settings, we can build a system capable of solving tasks it hasn’t been trained on. In this case, instead of showing the system new classes at test time, we’d ask it to carry out new tasks. In the case of a few-shot setting, we’d show it a few examples of how the task is done. And, without internally learning anything new, the system would now be conditioned to solve the new task.

For instance, let’s take a system trained in a huge text corpus. In a one-shot task transfer setting, we could write: "I love you -> Te quiero. I hate you -> ____." We are implicitly asking the system to translate a sentence from English to Spanish (a task it hasn’t been trained on) by showing it a single example (one-shot setting).

If we think about it, we humans can do this. We’re meta-learners. We don’t just learn to do tasks, but we know how to learn to do new tasks. If I see someone sweeping a room, I’d know how to do it right away. I’d understand that the movement of the broom has to be directionally consistent to clean the floor and I’d try to coordinate hands and feet to make the transitions smooth. We don’t only learn when someone trains us. We learn by observation. That’s what a few-shot task transfer is. And AI systems are starting to get better at it.


Subscribe to my free weekly newsletter Minds of Tomorrow for more content, news, insights, and reflections on Artificial Intelligence!

Also, feel free to comment and reach out on LinkedIn or Twitter! 🙂


Recommended reading

5 Deep Learning Trends Leading Artificial Intelligence to the Next Stage

GPT-3 – A Complete Overview


Related Articles