The world’s leading publication for data science, AI, and ML professionals.

Exploring the Latent Space of a ConvNet Image Classifier

Linearity in the latent space of hidden layers!

Photo by Alireza on Unsplash
Photo by Alireza on Unsplash

A fascinating thing about Neural networks, especially Convolutional Neural Networks is that in the final layers of the network where the layers are fully connected, each layer can be treated as a vector space. All the useful and required information that has been extracted from the image by the ConvNet has been stored in a compact version in the final layer in the form of a feature vector. When this final layer is treated as a vector space and is subjected to vector algebra, it starts to yield some very interesting and useful properties.

I trained a Resnet50 classifier on the iMaterialist Challenge(Fashion) dataset from Kaggle. The challenge here is to classify every apparel image into appropriate attributes like pattern, neckline, sleeve length, style, etc. I used the pre-trained Resnet model and applied transfer learning on this dataset with added layers for each of the labels. Once the model got trained, I wish to look at the vector space of the last layer and search for patterns.

Latent Space Directions

The last FC layer of the Resnet50 is of length 1000. After transfer learning on the Fashion dataset, I apply PCA on this layer and reduce the dimension to 50 which retains more than 98% of the variance.

The concept of Latent Space Directions is popular with GANs and Variational Auto Encoders where the input is a vector and the output is an image, it is observed that by translating the input vector in a certain direction, a certain attribute of the output image changes. The below example is from StyleGAN trained by NVIDIA, you can have a look at this website where a fake face gets generated from a random vector every time you refresh the page.

Photo from Interface GAN Github | Attributes like pose, age, expression, and eyeglasses can be tweaked by translating the input vector in the respective latent direction.
Photo from Interface GAN Github | Attributes like pose, age, expression, and eyeglasses can be tweaked by translating the input vector in the respective latent direction.

This is exciting! A trivial linear operation on the input vector brings about such a complex transformation in the output image. These latent directions are like knobs which you can tune to get the desired effect on the output image. Incredible!

In our image classifier, we are doing exactly the opposite of GAN. In the classifier, we take an image and get a vector of length 1000 in the FC layer, and with the PCA transformation, we reduce its dimension to 50. The idea we want to explore is whether the image attributes we have trained the model for, are they getting arranged in a linear fashion in the vector space of the FC layer and the layer after PCA. The answer is indeed a yes!

One of the tasks in the iMaterialist challenge is to identify the sleeve length which is classified into five types: long-sleeved, short sleeves, puff sleeves, sleeveless, and strapless. Out of these five, three are in majority: long-sleeved, short sleeves, and sleeveless. Now we want to see if there exists a latent direction along which these three sleeve length classes get segregated. If it was a binary Classification, we could have used logistic regression and the vector of coefficients would have given us the latent direction but in our case, we have multiple classes.

One method to get the latent direction in a multi-class problem is to build a neural network where we force the input layer through a single-unit middle layer and the vector of weights of the input layer gives us the latent direction. The code snippet for this classifier is given below where my input is the PCA layer of length 50 which is passed through a single unit layer and after a few layers, I get my final 5 classes. The idea behind forcing it through a single unit layer is to constraint the network to learn a linear latent direction. The value of this single unit layer essentially gives me the projection of inputs onto the learned latent direction.

class latent_model(nn.Module):
  def __init__(self):
    super(latent_model, self).__init__()
    self.layer1 = nn.Sequential(nn.Linear(50, 1), nn.Linear(1,10), nn.ReLU(),nn.Linear(10,10), nn.ReLU(), nn.Linear(10,5) , nn.Softmax())
  def forward(self, ip):
    x = self.layer1(ip)
    return x

lm=latent_model()
#####Train the Classifier#####
#######################
##The latent direction##
latent_direction=lm.layer1[0].weight[0].cpu().detach().numpy()

Results from the iMaterialist ConvNet Model

Let’s see what we get after training this small classifier! The plot below shows the distribution of different sleeve types along the latent direction, and we can see that the three major classes: short, long, and sleeveless get segregated along this direction.

Graph by Author: Distribution of sleeve length along the latent direction.
Graph by Author: Distribution of sleeve length along the latent direction.

Neural networks are complex non-linear models but there is linearity hidden in hidden layers!

Now I take an image of a dress and starting from the point in the PCA vector space for this image, I draw a line along the latent direction for sleeve length and retrieve images from the training dataset that are closest to this line. Below are a few such examples. An interesting thing to note is that, while the sleeve length changes along the direction from long-sleeved to short sleeves to sleeveless, other features of the dress as the neckline, dress length, and the overall style tend to stay preserved.

Images from Kaggle Dataset | Example1: Starting with #3, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset | Example1: Starting with #3, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example2: Starting with #1, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example2: Starting with #1, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example3: Starting with #4, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example3: Starting with #4, other images are retrieved that are closest to the line at different points.

In the second example, all the retrieved images are long dresses and have two poses in an image whereas in the third example most of them don’t have a model. The only significant change along the direction is in the sleeve length.

From the above results, it looks like the training dataset images have been arranged in a pristine order in the latent space of the PCA layer!

Doing a similar analysis for pattern gives the following results

Distribution of different pattern types along the discovered latent direction for pattern. (Graph by Author)
Distribution of different pattern types along the discovered latent direction for pattern. (Graph by Author)

From the above graph, we see that floral type dresses get segregated from more solid type patterns along the latent direction. Looking at a few examples confirms this observation.

Images from Kaggle Dataset |Example5: Starting with #6, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example5: Starting with #6, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example5: Starting with #2, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example5: Starting with #2, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example6: Starting with #4, other images are retrieved that are closest to the line at different points.
Images from Kaggle Dataset |Example6: Starting with #4, other images are retrieved that are closest to the line at different points.

In the above examples, the pattern slowly shifts from solid to stripes to floral to plaid along the discovered direction and the retrieved images tend to be similar to the starting image in terms of the overall style.

Conclusion

Looking at the latent space of feature vectors is a step forward in our understanding of the workings of Neural Networks. From our above analysis, one conclusion that we can draw is that Convolution Neural Networks take in complex data in form of images and the information extracted from those images gets more and more organized as the input passes through subsequent layers. We find that in the latent space of the last layers of the network our training dataset images get sorted and segregated in an orderly way so much so that we can discover linear directions along which a certain attribute of interest varies. Read more blogs here.


Related Articles