Why a major AI Revolution is coming, but it’s not what you think — AAAI 2020

From the AAAI 2020 Conference: Turing award scientists Hinton, LeCun and Bengio aim to make machines reason

Jeff Chen
Towards Data Science

--

From left to right, Yann LeCun, Geoff Hinton, Yoshua Bengio, Francesca Rossi, and Daniel Kahneman engage in a heated discussion of whether computers are actually ‘thinking’

You already know that Deep Learning is good at vision, translation, playing games, and other tasks. But Neural Networks don’t “learn” the way humans do, instead it’s just really good at fast pattern matching. Today’s research mainly focuses on bigger models with larger datasets and complicated loss functions. But the next revolution is likely going to be more fundamental. Let’s take a look at two approaches: adding logic with Stacked Capsule Auto Encoders and Self-Supervised Learning at scale.

All three Turing Award winners agree on Deep Learning’s problems

Yann LeCun presented the top 3 challenges to Deep Learning:

  1. Learning with fewer labeled samples
  2. Learning to reason
  3. Learning to plan complex action sequences

Geoff Hinton — "It's about the problems with CNNs and why they’re rubbish”:

  1. CNNs not good at dealing with rotation or scaling
  2. CNNs do not understand images in terms of objects and their parts
  3. CNNs are brittle against adversarial examples

Yoshua Bengio — “Neural Networks need to develop consciousness”:

  1. Should generalize faster from fewer examples
  2. Learn better models from the world, like common sense
  3. Get better at “System 2” thinking (slower, methodological thinking as opposed to fast recognition)

This about sums up what most AI scientists already know: Deep Learning is really good at doing narrow, pattern based tasks such as object or speech recognition. The challenges are to make the AI 1) Learn with less examples 2) Be more robust to trivial adversary attacks and 3) Be able to reason 4) Plan more complex action sequences.

Yann LeCun articulates three challenges for Deep Learning

Hinton proposes Stacked Capsule Auto Encoders, which injects priors into the structure of the Neural Net

The key insight here is that with Convolutional Neural Networks, we simply added a little bit of structure (the convolution) to a Fully Connected Neural Network, and it helped the network with image recognition tremendously.

Hinton now changes the basic neuron structure to a “Capsule” which consists of 1) a logistic unit to recognize a shape 2) a matrix for the pose and 3) a vector for other properties such as deformation, velocity, color, etc. The added structure should help the Neural Network recognize and store information about shapes, poses, and the other properties. Full Neurips paper here. Get started with AI here.

Shown below, the algorithm shows clear separation among the 10 digit categories in MNIST after unsupervised training. This means we can train the Neural Net using just 10 labeled samples (one for each digit).

Imagine a baby playing with blocks numbered 1–10…after a while the baby should be able to group similar numbers together (regardless of pose, deformity, etc) even though she doesn’t know what the numbers mean. That’s what’s happening here.

Hinton shows unsupervised learning of MNIST digits and the clear separation of classes using embeddings generated.

Hinton is trying to teach the computer to learn relationships, which leads to logic

Hinton’s Stacked Capsule Auto Encoders is an example of building knowledge structure into a Neural Network, which enables the network to reason. In the instance of MNIST, it’s recognizing how a digit is composed of many parts.

Taken a step further, you can imagine that we can represent other relationships this way, like capabilities (can walk, can fly, etc) and features (has head, has wings, etc). From here, it’s easy to see how a computer will be able to reason an animal is a bird because it has wings and can fly.

There are other ways to combine logic with Neural Networks, the jury is still out on the best way to do it.

Tesla’s autopilot is another example of combining Neural Networks with logic systems. In that case, Neural Nets are only used used to do object recognition, then a set a of manually written rules give directions to the car, for example “A pedestrian is walking in front of the car, apply brakes and come to a stop’.

The biggest differences are 1)Hinton’s relationship system is trainable using gradient descent, but pure symbolic systems are not and 2) Vector representations on trained systems can do similarity but symbolic systems cannot. For example, “Humans are similar to apes because they have 4 legs, a rounded face, and are about the same height.”

You can imagine other ways of combining logic and systems, like integrating symbolic reasoning inside a Neural Network or a cascade system where logic starts out fuzzy and then gets firmed up by symbolic rules. But it’s definitely not clear which method is best.

Yann LeCun proposes Self-Supervised Learning at scale

What if we skipped this fancy footwork with capsules, can the network still learn these relationships? That’s what Yann LeCun is thinking.

Babies learn this way. They first observe the world and learn about gravity by watching objects fall. So if you show a 10-month baby an object suspended in air, they will be very surprised — the baby learned about gravity and expected object to fall because they have been observing the world.

Yann’s thesis is if we are able to get Neural Network do video prediction, then we have a way of letting the computer learn all sorts of relationships about the world by itself without injecting specific relationships into the network structure.

But predicting future video frames is hard because the future is actually unpredictable, so the network will try to combine all the possible futures into one blurry image. Yann thinks energy-based models may make this possible soon.

An AI Revolution is coming

Most of the discussions around AI today focus on making bigger models and more complicated loss functions. But the inventors of Neural Networks are leading research that attacks the core short comings of Neural Nets.

While it’s unclear which method will lead to the next breakthrough, I’m very optimistic some of the research will work and cause a dramatic increase in AI capability. Users of AI would benefit by paying close attention and integrating the breakthroughs as quickly as possible.

--

--

AI engineer and company builder. Founded Joyride (acquired by Google). Current projects: thisisjeffchen.com