Creating quality embeddings from your data is crucial for your AI system’s efficacy. This article will show you different approaches you can use to convert your data from formats like images, texts, and audio, into powerful embeddings that can be used for your machine learning tasks. Your ability to create high-performance embeddings will have a large impact on the performance of your AI system, hence it is essential to learn and understand how to craft quality embeddings.

Introduction
The motivation for this article is that creating good embeddings from your data is essential to most AI systems and it is therefore something you often have to do, making better embeddings a good way of improving all your future AI systems. The use cases for creating embeddings are tasks like clustering, similarity search, and anomaly detection, all of which can massively benefit from better embeddings. This article will explore two main ways of calculating embeddings; using an online model or training your very own model, which will both be discussed in subsequent sections of this article.

Table of contents
· Introduction · Table of contents · Motivation and use case · Create embeddings using PyTorch models · Create embeddings using HuggingFace models ∘ Approach 1 ∘ Approach 2 · Create embeddings using GitHub · Creating embeddings using paid models · Create your own embeddings ∘ Autoencoders ∘ Training your own model on a downstream task · Typical errors when creating embeddings ∘ Forget to use a pre-trained model ∘ License · Conclusion
Create embeddings using PyTorch models
One of the simpler approaches to creating embeddings is utilizing Embedding models from the PyTorch model library. This library gives you easy access to an array of pre-trained models, which are ready to use off the shelve. This means you can use the models on your own data without needing to train the models.
If you want to get an embedding for your image you can for example use the ResNet model introduced in 2015. Ensure you have PyTorch installed (which you can do on the PyTorch website), and then you can run the following code, partially from the PyTorch model library tutorial.
from torchvision.models import resnet50, ResNet50_Weights
from PIL import Image as PILImage
# Initialize the Weight Transforms
resnet50(weights=ResNet50_Weights.DEFAULT)
img = PILImage.open("image1.jpg")
# Initialize the Weight Transforms
weights = ResNet50_Weights.DEFAULT
preprocess = weights.transforms()
# Apply it to the input image
img_transformed = preprocess(img)
# Initialize model
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
# Set model to eval mode
model.eval()
model(img_transformed.unsqueeze(0)).detach().numpy()
Where image1.jpg can be any image you would like. I used this image from Unsplash as the image to feed into my model, which is the image you see below.

The result after running the code was then a tensor of shape (1,1000) which looks like this:

You can find other models from PyTorch on their model library site, where you can find models on downstream tasks like image classification, object detection, and video classification.
The advantage of the PyTorch model library is how easy it is to use. Assuming you are already using the PyTorch framework, many powerful models will be available to you with one line of code. Though the pre-trained models will not necessarily generate the best embeddings, they can still achieve high performance on many Machine Learning tasks, and I have also experienced older models like for example ResNet, still performing well on simple tasks like creating image embeddings. I would therefore recommend not underestimating the power of these models.

Create embeddings using HuggingFace models
If you want more advanced and up-to-date models, I recommend looking at the HuggingFace website. HuggingFace is a website where a lot of the latest machine-learning models are uploaded for everyone to use. On HuggingFace you can get different models for all sorts of downstream tasks. The main types of models you can get from HuggingFace are:
- Multimodal models
- Computer vision
- Natural language processing
- Audio
- Tabular
- Reinforcement learning
The model types are taken from the HuggingFace models website, where you can also look into further downstream tasks for each of the model types.
To extract embeddings from the HuggingFace models, you should know how the models are typically uploaded. The authors typically upload 2 types of models.
- Base models. These are the embedded models, with no final layers trained to perform downstream tasks like sentence classification. The base models are typically under the headline Model. So if you want the RoBERTa base model, you would look for RobertaModel. An example base model is shown to the left in the image below.
- Fine-tuned models. These are the base models, plus some final layers that are trained to perform downstream tasks. If you want to extract embeddings from these models, you have to the embeddings from the models’ hidden states, ignoring the model’s final layers. There are typically several fine-tuned models, which are named with . So if you for example want RoBERTa for the sequence classification downstream task, you would look for RobertaForSequenceClassification. An example fine-tuned model is shown to the right in the image below.

In my experience working with the HuggingFace models, both types of models can work, and I therefore recommend testing out both models. It might sound like the Base models are the definitive right pick if you are only after embeddings, but sometimes the embeddings can become better after the final layers have been fine-tuned, especially if the fine-tuning task is relevant to what you are using your embeddings for. Testing out both types of models is therefore recommended, and considering the ease of use for HuggingFace models, this is typically not a task that requires a lot of effort. Spending that extra time testing out both models can therefore be a valuable use of your time.
As an example, if I want to get a text embedding, I would do the following:
Approach 1
- Go to the HuggingFace Transformers website
- Find the Text models on the left side (if you scroll down), and press one of the models
- After you have pressed your downstream task of choice, I would sort by Most Downloads or Trending in the top right
This will give you a full overview of the model you chose. You can also find some models with the other approach below but note that you will typically see less information about each model, and you will see more models from community members (not necessarily the original authors of the model)
Approach 2
- Go to the HuggingFace Models website
- Find the Text models on the left side (if you scroll down), and press a fitting downstream task. For text embeddings, I would for example try the tasks Text Classification and Sentence Similarity
- After you have pressed your downstream task of choice, I would sort by Most Downloads or Trending in the top right
As an example, you can create a text embedding with the base Roberta model, for the sentence Embedding using the Roberta model with the following code:
# embedding with base roberta model
from transformers import AutoTokenizer, RobertaModel
import torch
sentence = "Embedding using the Roberta model"
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = RobertaModel.from_pretrained("roberta-base", add_pooling_layer=True)
inputs = tokenizer(sentence, return_tensors="pt")
outputs = model(**inputs)
embedding = outputs.pooler_output
Which outputs a tensor of shape (1×768)
If you want to create a text embedding with a Roberta model fine-tuned for sequence classification, you can use the following code:
# embedding from fine-tuned roberta model for sequence classification
import torch
from transformers import AutoTokenizer, RobertaForSequenceClassification
sentence = "Embedding using the Roberta model"
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-emotion")
model = RobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-emotion", output_hidden_states=True)
inputs = tokenizer(sentence, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embedding = torch.mean(outputs["hidden_states"][-1], dim=1) #grabbing the last hidden layer, and applying mean pooling to get embedding
Which also outputs a tensor of shape (1×768).
You can read more about the Roberta model on the HuggingFace website.
HuggingFace is a useful website since authors of research papers will often publish their models on HuggingFace, which is important for further research both to verify the results achieved in the paper and also to allow for further research to build on top of current research. HuggingFace is, therefore, a major contributor to developing AI research, and a website you can take advantage of yourself to find the newest models available for public use.
Another useful trait of HuggingFace is the fact that is kept up to date with the newest models. This is the biggest difference I have noticed between the PyTorch model library and the HuggingFace models. This is because HuggingFace is updated more regularly with the latest machine learning models that are publicly available.
Create embeddings using GitHub
Another approach to finding models that can create good embeddings for your AI system is using GitHub. Many authors will upload both the code to train their models, but also a link to download the model weights on GitHub. If model weights are available from GitHub, they are often on HuggingFace, in which case you can naturally use the approach above, but sometimes this will unfortunately not be the case. Using models from GitHub often requires more effort than using the models from HuggingFace, but if you find a particularly useful model on GitHub, this could be worth the time investment.
Finding good models on GitHub can be difficult, however. A common way of finding a model is by simply Googling the task you are looking for, and a GitHub repository can appear in the search. Another approach would be to search a scientific article database like Google Scholar or ArchiveX, to find scientific papers with good models. These papers will then often include a GitHub repository with their code and model weights.
The best way however to find good models on GitHub is using Meta’s [PapersWithCode](https://paperswithcode.com/) website. PapersWithCode highlights different scientific research that has included the code together with the published scientific article. On PapersWithCode you can therefore find a good model for your specific use case, and then find the code for that model. I also recommend choosing the Browse State-of-the-Art tab to see the currently best models for the task of your choosing. Using this approach you can find some particularly good models that you can use to create embeddings for your dataset. The pipeline I would use for creating embeddings with models from PapersWithCode is:

Creating embeddings using paid models
If you want the best possible embeddings, using a paid online API is often the way to go. Several companies offer this type of API like Microsoft Azure, Google Cloud, or OpenAI. The cost of these embeddings will naturally depend on which model you use, and the quantity of data you want to embed. If you are only working on a hobby project, and make sure you do not overuse the embedding API, I would argue the cost can be quite reasonable. As an example, OpenAI offers a text embedding API with the text-embedding-3-small model which gives you around 37 500 000 words per dollar! (This is calculated using the pricing from the OpenAI website, taking 62500 pages per dollar, assuming 800 tokens per page, which is 8000.75 = 600 words per page, which gives 62500600 = 37 500 000 words per dollar). Quite cheap, unless you are embedding enormous amounts of text.
There are several advantages to using the paid API embedding services:
- They are simple to use. You have to obtain an API key, and you can then use the example code given on the website of your chosen company, and you have your embeddings
- The embedding models here are specially trained to give you meaningful embeddings. In contrast to many other models which are trained to perform downstream tasks. This will typically give a higher-quality embedding
- You will get access to State-of-the-Art models, which are typically not available for free on sites like HuggingFace. This means the quality of the embeddings from the paid APIs, will typically be as good as you can get.
I also want to give a warning here when using the paid APIs. First of all, it is easy to lose track of your API usage, which can incur a large cost if not kept under control. I therefore highly recommend being highly aware of when you are using the API, as they will incur a cost every time you use it. Additionally, it can be difficult to fully grasp the pricing model. You for example typically will not know exactly how many words you are sending to the API, which can make the cost of the API different than you might expect. Due to this, I recommend calculating in advance the cost your API usage will incur, to ensure you do not encounter any unexpected costs. Lastly, you also have to always keep your API key safe, and make sure it does not end up in the wrong hands, which can incur high costs.
Create your own embeddings
The last method for creating embeddings I will talk about is creating your own embeddings from scratch. Using this approach can be difficult, and you will likely not be able to replicate the embedding quality of the previous methods, due to the large amount of computing power creating a good embedding model requires. Still, however, making your own embeddings can be an important learning lesson to fully understand how embeddings work, and what they can be used for, which is why I also recommend this as an approach. Below, I showcase two approaches to creating your own embeddings.
Autoencoders
Autoencoders are models that take an input, for example, an image, downsize it to a lower dimension and then try to recreate the input data from the lower dimension. GeeksForGeeks has created a quality article on how this works, and how you can create your own autoencoder, which I recommend reading. As an example of training your own autoencoder, you have to get a hold of a lot of images, with the MNIST dataset for example which you can download here. You can then train the model on downsizing each image to a lower dimension with linear layers (encoder) and then recreating the same image with another set of linear layers (decoder). After training, you can then only use the downsizing linear layers (the encoder) as the embedding for your image.

Training your own model on a downstream task
Another approach to creating your own embeddings is training a model on a downstream task like for example image classification. Again, you can download the MNIST dataset, and make an image classifier network which you can learn about in this PyTorch tutorial. After training is completed, you can then ignore the final classification layers in your image classification model, to obtain the embedding.
To understand this better, I will show you an example with Python code. First, define a model architecture, like in the example below where I take in a (28×28) image and output a prediction out of 10 classes, similar to the MNIST problem.
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(64 * 7 * 7, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.pool(x)
x = x.view(-1, 64 * 7 * 7)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
def predict(self, x):
#return class that is predicted
return torch.argmax(self.forward(x), dim=1)
def get_embedding(self, x):
with torch.no_grad(): #we don't need gradients for the embedding
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.pool(x)
x = x.view(-1, 64 * 7 * 7)
x = F.relu(self.fc1(x))
return x
Here you can see how a fine-tuned model can be used to retrieve an embedding. To get a class prediction for the model, you would use the predict method, which outputs a number between 0–9. If you only want the embedding for the image however, you can use the _getembedding method, which similarly to the forward method (which is used for training the model), uses a lot of the model layers, but the _getembedding method ignores the final fully connected layer self.fc2, so instead of outputting a number between 0–9, the _getembedding method outputs a vector of shape (1×128), which can be used as an embedding for the input image.
To use the model defined above, you would therefore train the model, for example with this PyTorch tutorial, using the forward function. After training, you can then use the get_embedding function to get meaningful embeddings for the images.
Typical errors when creating embeddings
I also want to mention some typical errors you can make when creating embeddings. These are errors I have made myself while working on creating embeddings for my dataset.
Forget to use a pre-trained model
This one might sound obvious, but for example, when using PyTorch or HuggingFace models it can be simple to make the mistake of retrieving a non-pretrained model, which is essentially just a model architecture with a set of randomly initiated weights. Therefore, every time you get a hold of a model, ensure it is a pre-trained version of the model
License
If you are using the models for a non-commercial case, this is rarely an issue, and publicly available models, will most of the time have at least a personal or academic license. If you are working for a company however, or want to use the embeddings commercially, you have to be aware of the license of the model you are using. Models with a non-commercial license cannot be used in any way commercially, and you should therefore be wary of checking the license of the model before using it. Look for licenses like the MIT license or Apache 2.0 license which allow commercial use.
Conclusion
In this article, you have learned about different approaches you can use to get embeddings for your data. Embeddings are useful for any AI models you will use, and the embeddings can be made from all data types like images, text, and audio. The different approaches I have mentioned for calculating embeddings are:
- PyTorch models
- HuggingFace models
- GitHub models
- Paid APIs like Google, Microsoft, or OpenAI
- Creating your own embeddings
Any of these approaches can be used to convert your data into embeddings, which can then be used for tasks like clustering, similarity search, or classification.
If you want to continue learning more about embeddings, you can read my Towards Data Science articles on understanding embedding and graph quality below:
How To Improve AI Performance By Understanding Embedding Quality
How to Test Graph Quality to Improve Graph Machine Learning Performance
You can also read my articles on WordPress.