GANs with Attention

A little bit of background:

A Generative Adversarial Network (GAN) holds two networks in series: a Generator and a Discriminator. The popular example to explain GANs involves counterfeit money. The Generator tries to create an image that looks like a dollar bill, and the Discriminator tries to tell that counterfeit apart from the image of a real dollar bill. After training the two networks, the Generator has learned to create images that look very much like a dollar bill, while the Discriminator has learned to reliably distinguish those counterfeit images from a real dollar.


That is a one-off process. The Generator creates a counterfeit, the Discriminator tries to identify it, and then they are both scored on their performance. There is no feedback, no further adjustment; that counterfeit image is discarded.

A similar approach, used in recurrent neural networks, is the Actor-Critic Model. The Actor, which would normally only receive a ‘score’ at the end of their performance, is instead ‘scored’ along the way by a Critic. The Critic learns to predict which actions affect the final score, and the Actor learns to perform the actions that affect the final score, via the Critic’s moment-to-moment scoring. This is still a one-off process. The Actor is unable to ‘film that scene again’, in response to criticism.

So, What About Revisions?

Another metaphor, to describe a better approach: an Author produces their dollar-image (or block of text, or string of actions,…) and sends it to the Editor. The Editor then marks that dollar-image in the places where it finds mistakes, and sends it back to the Author. The Author focuses their attention on those mistakes, and corrects them, sending the new draft back to the Editor… Repeat this cycle, until the Editor finds no mistakes. The Author network learns to take an image along with an attention-filter, and apply changes to the regions that are highlighted. The Editor learns to take an image and its target, and highlight regions that should change.

This Author-Editor model enables valuable new behaviors. Once the Author has been trained to adapt an image according to an Editor’s marks, a human can also correct the Author by selecting areas they would like to change. This user-input could direct different Authors to focus their attention on different areas of an image. A user could select some areas of an image, and apply the attention of a ‘Winter-to-Spring Author’, while selecting other areas of that same image, to apply a ‘Real-Life-to-Chibi Author’. Each Author makes modifications according to its own attention. You can be the Editor.

More than That…

The Editor is trained to highlight mistakes. So, if the Author-Editor model is trained on text, the Editor has an immediate application: correcting mistakes in what humans write! (like Grammarly…)

Also, Author-Editor networks can collaborate with humans in real-time. For example, you might draw an animal — a fox? — which is passed to the Editor, who tries to identify regions that should be modified. The Editor may highlight areas on your fox-drawing where the lines can be smoothed, or suggest where to alter their position, proportionality, and overall texture, for example.

The Editor can make mistakes! It might think that you are drawing a cat, highlighting a few regions improperly. You can review these highlighted regions, and replace the Editor’s highlighting. Those regions can then go to the Author, who focuses the network’s attention on the regions, and makes changes. You review the Author’s updated draft, and can make your own changes. Then, send it to the Editor… Repeat, until you are satisfied.

Selecting Alternatives:

This Author-Editor model also makes it possible for people to co-create content with the help of multiple Authors and Editors. Authors’ attention can be applied to different areas, or to the same area. If each Author is active in a different region, (like the Winter-to-Spring and Real-Life-to-Chibi Authors, mentioned earlier) their drafts can be combined. However, when Authors’ regions of attention overlap, the Authors each provide an alternative, and you can select which alternative is applied.

(You might have three painting-style Authors, and highlight the same region of an image, to see what they each suggest for that one region. Then, you could apply each Author’s work in different parts of your highlighted region, with an opacity filter. Choose between them, piecemeal!)

Having multiple Editors provides a similar benefit — each Editor highlights the regions that it thinks should be modified, and you choose between their suggested highlights. You could also stop the Editors from changing things you’d like to keep!

Training on Revision Histories:

An Author-Editor GAN differs from existing GAN architectures, by training the Author on an attention field that is generated by the Editor, and by revising many times, to produce a final output. In this way, the Author-Editor model is similar to Recurrent NNs with attention: the revision-history can be ‘unrolled’ like an RNN’s action-history, and each revision focuses attention on new relevant areas. Revision differs from recurrence, though — an RNN describes a sequence, like choosing paths in a maze, while Author-Editor revisions describe an equilibrium, a settling-place where the Editor stops highlighting changes. Recurrent NNs cannot contain revisions, but revisioning can contain recurrence: an Author could write the sequence of choices in a maze, and it writes and re-writes that sequence until the Editor is satisfied. :]