An Extensive Introduction to Latent Spaces

Into Image Embedding and Latent Spaces

A theoretical and practical introduction to image embedding and Latent Spaces.

Mathias Grønne
Towards Data Science
7 min readSep 1, 2022

--

Photo by Chris Lawton on Unsplash

They say an image says more than a thousand words. Just look at the image above and imagine the story it tells us about its leaves, their colors, and the life they have lived. It would be intriguing to let a computer tell some of these stories for us, but can a computer tell us that there are leaves in the image? Their colors? or that fall is coming? These are easy things for humans as we only need to look at the image. But it is more challenging for computers as all they see are numbers, which would be difficult to understand for humans as well.

The good thing is that we can help the computer understand. A computer can change the image into something easier for it to understand. It can change the image into variables that tell it about the texture, color, and form of the objects in the image. With these, the computer can begin to tell us stories about what is in the image. In order to tell these stories, the image must first be embedded into a computer and afterward transformed into variables by embedding it into the Latent Space. We will call the latter Latent Space Embedding to differentiate between the two.

This book aims to give a theoretical and practical introduction to image embedding, the latent space embedding, and techniques used by different applications. The book starts with the basics and builds towards modern methods of , all supported with examples and code to make it easier to understand.

The series comprises 8 chapters, each presented in the sections below. Follow me to receive an e-mail when I publish a new chapter or post.

  1. Introduction to Embedding, Clustering, and Similarity
  2. Introduction to Image Embedding and Accuracy
  3. Introduction to Image Latent Spaces Embedding
  4. Clustering in the Latent Space
  5. Practical: Face-detection in the Latent Space
  6. Introduction to Autoencoders
  7. Adapt the Latent Space to Similarity
  8. Practical: Product recommendation in the Latent Space

Chapter 1: Introduction to Embedding, Clustering, and Similarity

Knowledge represented with numbers in a coordinate-system.

Chapter 1 explains what embedding is and how it can be used to represent the real world in variables. Each variable is a question as it asks the world for knowledge; the challenge is asking the right questions. Asking the right questions can influence how easy the computer can understand the variables and whether we can use the variables successfully in an application.

It is hard for the computer to understand embedded images, which is why we won’t look at them just yet. We will instead look at simpler representations of these and how these can be embedded using the right questions. The goal is to understand the underlying mechanisms of embedding before looking deeper into images.

Chapter 2: Introduction to Image Embedding and Accuracy

Process of turning an image into numbers is nessecary to use them for Machine Learning. Book photo by Jess Bailey on Unsplash

It is now time to expand the concept of embedding from Chapter 1 to image embedding. Images is a popular choice in many application that uses Machine Learning. An image consists of small squares that each show a color. These colors create the whole image when put together. All the colors can be represented with a number and seen as a variable.

Chapter 2 tries to make an application that can identify animals in images using the methods taught in Chapter 1. Accuracy is introduced to measure how well these methods can identify the correct animals. The results show that the methods from Chapter 1 are insufficient for a reliable application and that Latent Space Embedding can help us improve the accuracy.

Chapter 3: Introduction to Image Latent Space Embedding

Images are transformed by asking whether they look more like a dog or a chicken and plotted in a coordinate system. Dog photo by Valeria Boltneva on Pexel. Chicken photo by Erik Karits on Pexel.

Chapter 3 uses the strengths and weaknesses learned from chapter 2 and begins an exploration of Latent Space Embedding.

Using the methods from chapter 1 on images is challenging since the same colors do not necessarily result in the same object; A dog and a chicken can both be brown but are not the same. Transformation can lessen these challenges using standard Latent Space Embedders such as Principle Component Analysis¹ (PCA), Linear Discriminant Analysis² (LDA), and neural network classifiers³. PCA and LDA help us visualize the images in 2D but do little to make the images easier to understand. Neural Network Classifiers on the other hand significantly contribute to how well a computer can understand an image.

Neural Network Classifiers are imperfect and have their own challenges when used in practice. Simple adjustments are made to the Neural Networks to address these challenges. No adjustments come without consequences; At the end of chapter 3 the trade-offs and when to use what, are discussed.

Chapter 4: Clustering in the Latent Space

The goal is to make clear separations between the different colored groups.

Chapter 4 explains how to make improved Latent Space Embedding for better differentiation between new animals.

Some of the challenges mentioned in chapter 3 are caused by how Neural Network Classifiers are trained to learn the differences between a defined set of classes (e.g., animals). After learning the differences, we can lessen these challenges by asking the neural network how it came to its conclusions. It might base the decision on features such as the animal’s color, fur, leg position, teeth, etc. The idea behind autoencoders is to use these features to disguise between new animals that are introduced in the future.

But why must we learn to recognize animals before asking what features are used? Why not just learn the best features to separate them from the get-go? Chapter 4 introduces modern approaches incorporating these ideas to more efficiently separate similar but different instances such as human faces.

Chapter 5: Practical: Face-detection in the Latent Space

Image of famous people as an example of who your system may want to recognize. Source: Labeled Faces in the Wild.

It is now time to use our newfound knowledge in practice. The first practical case is where we build a system to identify which one of your friends is in an image. We will see how Latent Space Embedding that focuses on classification can create a scalable system where you can easily add new friends and won’t need to worry about recognizing strangers as your friends.

Chapter 6: Introduction to Autoencoders

An Autoencoder automatically creates a Latent Space Embedding by training a Neural Network to recreate the input image with a decoder after having transformed the image to the Latent Space with an encoder. Dog photo by Valeria Boltneva from Pexel

Until now, the focus has been on how to make a Latent Space Embedding by training a Neural Network with images and labels (E.g., Animals or Faces). However, is it possible to make the process easier and create a Latent Space Embedding without labels? This is where autoencoding comes in!

An autoencoder automates the labeling process by having the Neural Network recreate the input image with a decoder after having transformed the image to the Latent Space with an encoder. The idea is that different dimensions tend to represent different features as similar objects are closer to each other in the Latent Space. Dimensions representing features help the decoder recreate the original images as it creates them based on the features. An Autoencoder can thereby help create the Latent Space automatically.

Chapter 7: Adapt the Latent space to similarity

Latent Space Embedding using a neural network classifier focus on creating a clear separation between classes, making it easier to determine which class an image belongs to. The problem is that these are ineffective when it comes to finding similarities. Similar images from the same class are closer, but similar images across classes are down-prioritized as the Neural Network tries to separate each one.

Chapter 7 uses the concept of autoencoder from chapter 6 and adapts them to similarity by using regularization and social data to guide the process.

Chapter 8: Practical: Product Recommendation in the Latent Space

The image to the left is uploaded by the user and the images to the left are recommendations from the system. All images is from Unsplash, from upper left to lower right: Ryan Christodoulou, Ryen Christodoulou, Alex Shu, Beazy, Ali Choubin, Ali Choubin, Ashkan ForouZani, Olena Sergienko, and Sven Brandsma.

The second practical case is where users can upload images of products they like and want to use as inspiration to find their next piece of furniture. The system looks for similar products and recommend them to the user. We will see how the autoencoders that focus on similarity can create a scalable system where you can easily add new products.

All images and code, unless otherwise noted are by the author.

Thank you for reading this book about the Latent Space! We learn best when sharing our thoughts, so please share a comment, whether it is a question, new insight, or maybe a disagreement. Any suggestions and improvements are greatly appreciated!

If you enjoyed this book and are interested in new insights into machine learning and data science, sign up for a Medium Membership for full access to my content. Follow me to receive an e-mail when I publish a new chapter or post.

--

--

A Machine Learning specialist with a passions for sharing knowledge and learning new stuff. Contact: mathias0909@gmail.com