GANs and Inefficient Mappings
How GANs tie themselves in knots and why that impairs both training and quality

A warning to mobile users: this article has some chunky gifs in it.
Generative Adversarial Networks (GANs) are being hailed as the Next Big Thing™️ in generative art, and with good reason. New technology has always been a driving factor in art — from the invention of paints to the camera to Photoshop — and GANs are a natural next step. For instance, consider the following images, published in a 2017 paper by Elgammal et al.

If you’re unfamiliar with GANs, this article includes a succinct overview of the training process. In short, GANs take random noise as input and (if training goes well) produce output that is indiscernible from the real data, where the real data can be practically anything (a set of abstract paintings, photos of celebrity faces, handwritten digits, etc.).
It is a well-documented problem in GAN literature, as in variational autoencoders before it, that the input values often have no clear relationship with the output. As I mentioned before, GANs accept random noise (canonically 100 random, normally-distributed values) as input, and each of these random numbers can be thought of as a lever of control for the output. Ideally, each lever would correspond to one feature — in the case of generating faces, there would be one lever for smile vs. frown, one for skin colour, one for hair length, etc; this is rarely the case, and makes using GANs for art something of a crapshoot. As a visualization of this problem, consider the following animation:

Here, I trained a GAN on the MNIST handwritten digit dataset using a latent space of 16 dimensions. I generated a random sample using this GAN, then illustrated how the output changes as one of the input values is adjusted while the rest are fixed in place. As you can see, neither lever in question changes the output in a way that a human might find intuitive or useful; the first lever controls whether the digit is a 7 or a 9 as well as stroke angle, while the second lever controls whether it’s a 7 or a 9 as well as stroke thickness. Stupid, right? One can imagine what an ideal tool for generating “handwritten” digits might look like: the first lever controls which digit to generate, from zero to nine; the second lever controls thickness of the stroke; the third controls stroke angle; the fourth, loopiness… you get the idea. Instead, we see several of these traits being controlled by a single lever, and one of these traits being controlled by multiple levers. Imagine how frustrating it would be if Photoshop’s rotate tool also rotated an image’s hues through the colour wheel!
The obvious problem here is that this makes for an inefficient and downright confusing interface for image generation. However, there is another, less obvious problem: the twisted and complex relationship between input and output also impedes training and limits the overall quality of the output.
Problem One: Spiral
As I explain in this article, GANs are essentially a tool for modelling some data distribution, be it the normal distribution or the distribution of human faces. The GAN, therefore, is a transformation or mapping from some latent space to some sample space. This point is often overlooked as students of GANs dive headlong into high-dimensional problem like image generation. Here, I intend to demonstrate the inefficient mapping problem using simple, 2-dimensional problems, the first of which is illustrated here:

This is a fairly straightforward function which maps the x-axis in the input space to the position along the spiral in the sample space (angle and radius) and the y-axis to the position laterally within the spiral. For the purposes of visualization, the x-axis is also mapped to hue and the y-axis is mapped to value (colourful vs. black). To further clarify this function, consider the following animation:

The problem, then, is to train a GAN that is capable of sampling points from this spiral distribution in such a way that a batch from the GAN and a batch from the true function are indiscernible. Note that the GAN doesn’t have to learn the original mapping; any mapping will do, so long as the output distributions are the same.
Results:

The GAN was trained for 60k training steps using typical GAN training techniques (code available at the end of the article). As you can see, the GAN successfully learned a spiral distribution. However, it has several issues:
- It’s a lot skinnier than the target function; although some section of the spiral have some width to them, the GAN has essentially reduced this distribution to a 1-dimensional manifold in 2-dimensional space.
- The output is messy; note the points scattered about in the negative space of the spiral. These never occur in the target function, so what are they doing here?
- Note the strange artifacts at points (0.60, -0.63) and (0.45, 0.17); these discontinuities result in holes in the distribution.
- Compare the distribution of hue and value in the GAN-produced spiral to those in the original function (figure 3); they’re much less ordered, and show no clear relationship between latent space (the levers) and the output.
All four of these problems are illustrated in this animation:

As you can see, all four of these problems are, in fact, the same problem. Comparing figure 6 to figure 4, we see that the GAN has learned an inefficient mapping. First of all, consider the tear in the top-right corner of the latent space; the region of the latent space above the tear is mapped to the outermost section of the spiral, while the region immediately below the tear is mapped to the centre of the spiral. This tearing behaviour explains the messiness (issue 2); any point that lies on the tear is mapped to some place between those two extremes, typically falling in the negative space of the spiral. It also explains the artifact at (0.60, -0.63) (issue 3), since points generated in this region were mapped from distant points in the latent space, which is also why the hues and values of the colours also don’t line up (issue 4). Finally, the skinniness of the learned distribution (issue 1) is explained by the complexity of the mapping; the majority of the variance of the distribution comes from the position along the spiral, with the position within the spiral’s width being less significant. Therefore, the GAN first learned how to create a spiral. Whenever it tried to broaden out, the complexity of the mapping caused some other region to break, not unlike a novice developer’s spaghetti code (we’ve all been there). The GAN has essentially trapped itself in a local minimum from which it can’t escape. If you’re curious (if not, respectfully, why are you reading this article?), this is what the GAN looks like while training:

Figure 7 shows that the GAN quickly learned incompatible mappings for the outermost and innermost regions, and the rest of the distribution was forced to reconcile between them.
Problem Two: Eight Gaussians

This function maps a 2.5-dimensional space to 2-dimensional space. The first two dimensions in the latent space are independent, standard normally-distributed values. The remaining “0.5” is a discrete dimension with eight possible values, encoded as a vector of length eight where one value is set to one while the rest are zero. In figure 8, a random sample in the latent space is illustrated by plotting the two continuous dimensions on the x- and y-axes while the discrete dimension is represented by the colour. The target function maps this latent space to the sample space by rescaling the normal distribution by a factor of 0.2 and shifting it to one of eight points, based on the value of the latent dimension. The process is animated here:

The problem, then, is to train a GAN that is capable of sampling points from this eight gaussian distribution in such a way that a batch from the GAN and a batch from the true function are indiscernible. Note that just as in the above spiral problem the GAN doesn’t have to learn the original mapping, but there is a simple mapping that is obviously preferable.
Results:

It’s bad. The GAN completely failed to generate samples in two of the modes (mode collapse), it produces a substantial number of points between modes, it failed to produce normally distributed modes, and there is clearly no reasonable relationship between the latent space and the sample space. This is even more clear in the following animation:

It’s immediately obvious that different regions within the two continuous latent values are cut and mapped to six of the sample space modes. This presents the same problem that the tear did in the spiral problem; points that land on the tear are mapped to the negative space between modes. Despite the simple solution (namely, scale down the continuous dimensions and map each value in the discrete dimension to a different mode), the GAN settled into a local minimum and was unable to dig itself back out.
Problem Three: One Gaussian

The eight gaussians problem was clearly too difficult, so here is an even simpler problem: convert 2-dimensional uniform noise to 2-dimensional standard normal noise. As in the spiral problem, points are coloured by rotating hue along the x-axis and varying value along the y-axis of the latent space. The simplest mapping is straightforward: stretch out each dimension independently. This is illustrated here:

Results:

As you can see, even when it comes to sampling from something as simple as a 2-dimensional normal distribution, the GAN still ties itself in knots. Most notably, the GAN seems to have folded the latent space over itself, resulting in a kink, gap, and protrusion in the bottom-right of the sample space. Here is the interpolation, animated:

Closing Thoughts
It’s possible I’ve belaboured the point. However, I hope that the above visualizations have made it clear that the fuzzy relationships between input and output features are more than a simple inconvenience, and are instead a symptom of a much more fundamental problem. If you’re interested in the code used to train the above GANs or the visualization code, both are available at the following github repo:

