In my previous articles, I wrote about using Knowledge Graphs in conjunction with RAGs and how Graph techniques can be used for Adaptive tokenization to build more context-aware LLMs. In this article, I am excited to present my experiments combining Text Embeddings and Knowledge (Graph) Embeddings and observations on RAG performance. I will start by explaining the concept of Text and Knowledge Embeddings independently, using simple open frameworks, then, we will see how to use both in RAG applications. This is fairly a long article and I deliberately didn’t want to write across multiple parts, so you all have the flow and try this in the sequence as I shared below.

I will take a deep dive and cover my work in four parts as listed below.
- Part 1: What are Text Embeddings (TE)? & How are they stored and used in the RAG implementation?
- Part 2: What are Knowledge (Graph) Embeddings (KGE) & How are they stored?
- Part 3: How are Knowledge (Graph) Embeddings different from Text Embeddings, and analyze if they are complimentary in the context of usage in RAG
- Conclusion: Benefits of combining text and knowledge embeddings in RAGs and overall summary
Part 1: Text Embeddings and RAG Implementation
If you tried venturing into Natural Language Processing (NLP), you might have encountered the term "text embeddings" while exploring language models and machine learning. So, what exactly are text embeddings, and how do they work? Allow me to explain it in a more accessible manner.
Text Embeddings Introduction
Text embeddings are numerical representations of words or phrases that effectively capture their meaning and context. Consider them as unique identifiers for words – concise vectors that capture the meaning of the words they represent. These embeddings allow computers to enhance their understanding and processing of text, enabling them to excel in various NLP tasks such as text classification, sentiment analysis, and machine translation.
We utilize pre-trained models such as Word2Vec, GloVe, or BERT to generate text embeddings. These models have been extensively trained on large volumes of text data and have acquired the ability to encode semantic information about words and their relationships.
- Tokenize [decide the algorithm we’ll use to generate the tokens. [Word-based (one i chose below), Character-based, Sub-word etc.) Refer to my article on Advanced Tokenization using Graph Techniques
- Encode the tokens to vectors (Byte-Pair Encoding etc.)
Now, let’s explore the process of generating text embeddings using a straightforward Python code snippet (word2vec):
# Code Implementation: Generating Text Embeddings
import numpy as np
from gensim.models import Word2Vec
# Sample sentences
sentences = [
["I", "love", "natural", "language", "processing"],
["Text", "embeddings", "are", "fascinating"],
["NLP", "makes", "computers", "understand", "language"]
]
# Train Word2Vec model
model = Word2Vec(sentences, vector_size=5, window=5, min_count=1, sg=1)
# Get embeddings for words
word_embeddings = {}
for word in model.wv.index_to_key:
word_embeddings[word] = model.wv[word]
# Print embeddings
for word, embedding in word_embeddings.items():
print(f"Embedding for '{word}': {embedding}")
In this code, we develop a Word2Vec model by training it on a set of sample sentences. The model then generates embeddings for each word. These embeddings capture the semantic relationships between words in the sentences. The output of the code snippet for generating text embeddings using Word2Vec is as follows:
Embedding for 'I': [-0.01978252 0.02348454 -0.0405227 -0.01806103 0.00496107]
Embedding for 'love': [ 0.01147135 -0.00716509 -0.02319919 0.03274594 -0.00713439]
Embedding for 'natural': [ 0.03319094 0.02570618 0.02645341 -0.00284445 -0.01343429]
Embedding for 'language': [-0.01165106 -0.02851446 -0.01676577 -0.01542572 -0.02357706]
Embedding for 'processing': [-0.00205235 0.01240269 -0.03660049 -0.0240654 -0.03612582]
Embedding for 'Text': [ 0.02553515 0.03493764 0.01932768 -0.02028572 0.02185934]
Embedding for 'embeddings': [ 0.01769094 0.02836292 -0.02720205 -0.01580047 -0.0323391 ]
Embedding for 'are': [ 0.01449668 0.0178032 0.02154245 -0.02403528 -0.03612522]
Embedding for 'fascinating': [ 0.0389471 0.00991404 0.0198368 -0.02466156 -0.03446501]
Embedding for 'NLP': [ 0.00828243 -0.02125732 0.01082581 0.02250805 0.02270168]
Embedding for 'makes': [ 0.01696491 0.0151721 -0.02186292 -0.01510419 -0.02021307]
Embedding for 'computers': [ 0.00983663 -0.02679762 0.03002482 -0.02373362 -0.01307139]
Embedding for 'understand': [-0.0326019 0.01835899 0.01538064 -0.01008516 0.01717436]
In the above output:
- Every line corresponds to the embedding vector of a word.
- Each line begins with the word, followed by its embedding vector represented as a list of numerical values.
- For example, the embedding for the word "love": [-0.01978252 0.02348454 -0.0405227 -0.01806103 0.00496107].
Implementing Text Embeddings in RAG
Now that we have a grasp on the process of generating text embeddings, let’s explore their application in a Retrieval-Augmented Generative Model (RAG). Combining retrieval-based and generative approaches, RAGs utilize text embeddings to grasp the context of input queries and extract pertinent information during the retrieval stage of NLP tasks.

Step 1: Tokenization and Encoding.
Let’s try utilizing a pre-trained model such as BERT now, to tokenize and encode the input query. This transforms the query into a numerical representation that captures its semantic meaning and context.
# Code Implementation: Tokenization and Encoding
from transformers import BertTokenizer, BertModel
# Initialize BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
# Tokenize and encode the input query
query = "What is the capital of France?"
input_ids = tokenizer.encode(query, add_special_tokens=True, return_tensors="pt")
Here, we use BERT to tokenize and encode the input query into numerical IDs. Since the BERT model initialization and tokenization process involve loading a large pre-trained model, The output of the tokenization and encoding step includes the following components:
- Input IDs: These are numerical representations of the tokens in the input query. Each token is converted into an ID that corresponds to its index in the BERT vocabulary.
- Attention Mask: This is a binary mask indicating which tokens are actual words (1) and which are padding tokens (0). It ensures that the model only attends to real tokens during processing.
- Token Type IDs (for models like BERT): This indicates which segment or sentence each token belongs to in case of multiple segments. For single-sentence inputs, all token-type IDs are typically set to 0.
The output is a dictionary containing these components, which can be used as input to the BERT model for further processing.
Here’s an example of what the output will include:
{
'input_ids': tensor([[ 101, 2054, 2003, 1996, 3007, 1997, 2605, 1029, 102, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 0]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
}
In this example:
input_ids
contains the numerical IDs of the tokens in the input query.attention_mask
indicates which tokens are actual words (1) and which are padding tokens (0).token_type_ids
indicates the segment or sentence to which each token belongs (0 for the first sentence in this case).
Step 2: Retrieval and Similarity Matching
Next, we retrieve relevant passages from a corpus based on the encoded query. We compute similarity scores between the query embedding and embeddings of passages using cosine similarity.
# Code Implementation: Retrieval and Similarity Matching
from sklearn.metrics.pairwise import cosine_similarity
# Retrieve passages and compute similarity scores
query_embedding = model(input_ids)[0].mean(dim=1).detach().numpy()
passage_embeddings = ... # Retrieve passage embeddings
similarity_scores = cosine_similarity(query_embedding, passage_embeddings)
Here, we select the passage with the highest similarity score and output it along with the similarity score.
Since the retrieval and similarity matching step involves computing similarity scores between the query embedding and passage embeddings, I can provide an example of what the similarity scores might look like. Let’s assume we have three sample passages and we compute cosine similarity scores between the query embedding and embeddings of these passages. Here’s an example of what the output might look like:
similarity_scores = [0.75, 0.82, 0.65]
In this example:
- The similarity score of 0.75 indicates a moderate similarity between the query and the first passage.
- The similarity score of 0.82 indicates a high similarity between the query and the second passage.
- The similarity score of 0.65 indicates a lower similarity between the query and the third passage.
The similarity scores indicate the level of similarity between each passage and the input query, with higher scores indicating a stronger resemblance. In the RAG model, the passage that receives the highest similarity score is deemed the most relevant for further processing.
Step 3: Augmenting Context
Finally, we designate the passage with the highest similarity score as the most pertinent passage. This passage provides pertinent information for the generative phase of the model.
# Select passage with highest similarity score
max_similarity_index = np.argmax(similarity_scores)
selected_passage = passages[max_similarity_index]
# Output selected passage and similarity score
print("Selected Passage:")
print(selected_passage)
print("Similarity Score:", similarity_scores[0][max_similarity_index])
Let’s assume that the second passage has the highest similarity score (0.82) and is selected as the most relevant passage. Here’s an example of what the output might look like:
Selected Passage:
"Barack Obama served as the 44th president of the United States."
Similarity Score: 0.82
In this example:
- The selected passage is "Barack Obama served as the 44th president of the United States."
- The similarity score is 0.82, indicating a high similarity between this passage and the input query.
Text embeddings are incredibly powerful tools in the field of Natural Language Processing (NLP), allowing computers to effectively comprehend and process textual information. They have a significant impact on a wide range of tasks, such as answering questions, generating text, and analyzing sentiment. Through the utilization of text embeddings in advanced models such as RAG, we can significantly improve their performance and precision, resulting in responses that are more astute and contextually appropriate. As you progress in your exploration of NLP, gaining expertise in text embeddings will be crucial for unleashing the complete capabilities of language models and propelling the advancements in natural language processing.
Part 2: Knowledge (Graph) Embeddings and Implementation
Let us now switch gears to the world of advanced Knowledge Graphs and look at defining and implementing Knowledge Embeddings denoting structure domain construct from unstructured data.
Introduction to Knowledge (Graph) & Embeddings
Knowledge Graphs are a highly effective way to organize information, connecting entities and their relationships in a meaningful way. These graphs function as a well-organized storehouse of information, capturing the meaning of real-world objects and their connections. Nevertheless, the process doesn’t conclude with the development of Knowledge Graphs. Exploring the realm of Knowledge Graph Embeddings is essential for unlocking their full potential.
Let us explore Knowledge Graph Embeddings, or simply Knowledge Embeddings, using a simple text with diverse entities and their interrelations. Starting from extracting data to generating embeddings and capturing the true meaning of the text in compact vectors. The goal of acquiring these embeddings is to facilitate the manipulation of graph elements (entities, relations) for various prediction tasks like entity classification, link prediction, or recommender systems.
We maintain the fundamental structure of the KG and streamline the utilization of KG components. After representing KG elements as embeddings, a scoring function is employed to assess the plausibility of a triple, such as ‘Tim’, ‘is an’, ‘Artist’.
Following are steps to implement Knowledge (Graph) Embeddings:

Step 1: Triple Extraction & Processing
Given an unstructured text, we first will extract key entities, relationships and attributes using Stanford’s OpenIE framework. Once the triples are extracted, we clean/ harmonize them.
Ex; Consider the input text "Hawaii is a state in the United States. Barack Obama served as the 44th president of the United States. The Louvre Museum is located in Paris, France."
from openie import StanfordOpenIE
text = "Hawaii is a state in the United States. Barack Obama served as the 44th president of the United States. The Louvre Museum is located in Paris, France."
with StanfordOpenIE() as client:
triples = client.annotate(text)
for triple in triples:
print(triple)
cleaned_triples = [(subject.lower(), relation.lower(), object.lower()) for (subject, relation, object) in triples]
print("Cleaned Triples:", cleaned_triples)
The output of the above code is
('Hawaii', 'is', 'a state in the United States')
('Barack Obama', 'served as', 'the 44th president of the United States')
('The Louvre Museum', 'is located in', 'Paris, France')
Cleaned Triples: [('hawaii', 'is', 'a state in the united states'), ('barack obama', 'served as', 'the 44th president of the united states'), ('the louvre museum', 'is located in', 'paris, france')]
we are now ready to use this information and create a Knowledge Graph
Step 2: Knowledge Graph Construction
Using NetworkX framework, we will now use the above-cleaned triples and construct a knowledge graph with entities → nodes and relationships → edges. Here is the implementation:
import networkx as nx
# Create a directed graph
knowledge_graph = nx.DiGraph()
# Add nodes and edges from cleaned triples
for (subject, relation, object) in cleaned_triples:
knowledge_graph.add_edge(subject, object, relation=relation)
# Visualize the knowledge graph
nx.draw(knowledge_graph, with_labels=True)
The output is a graph as seen below:

Step 3: Entity Resolution & Attribution
Entity resolution plays a pivotal role in various NLP applications, including information extraction, question answering, knowledge graph construction, and more. By accurately linking mentions of entities in text to their corresponding entities in structured knowledge representations, entity resolution enables machines to understand and reason with natural language more effectively, facilitating a wide range of downstream tasks and applications.
Fundamentally, entity resolution addresses the challenge of ambiguity and variability in natural language. In everyday language usage, entities such as people, locations, organizations, and concepts are often referred to using different names, synonyms, abbreviations, or variations. For example, "Barack Obama" might be mentioned as "Obama," "the former U.S. president," or simply "he." Additionally, entities with similar names or attributes may exist, leading to potential confusion or ambiguity. For instance, "Paris" could refer to the capital city of France or a different location with the same name.
Let’s go through each entity resolution technique and provide examples based on the text used:
- Exact Matching: In the text, the mention "Hawaii" can be directly linked to the node labeled "Hawaii" in the graph since they match exactly.
- Partial Matching: If the text mentions "USA" instead of "United States," a partial matching algorithm might recognize the similarity between the two and link the mention to the node labeled "United States" in the graph.
- Named Entity Recognition (NER): Using NER, the system might identify "Barack Obama" as a person entity mentioned in the text. This mention can then be linked to the corresponding node labeled "Barack Obama" in the graph.
- Coreference Resolution: If the text mentions "He served as president," coreference resolution could link "He" back to "Barack Obama," mentioned earlier in the text, and then link it to the corresponding node labeled "Barack Obama" in the graph.
- Disambiguation: Suppose the text mentions "Paris" without specifying whether it refers to the city in France or another location. Disambiguation techniques might consider contextual information or external knowledge sources to determine that it refers to "Paris, France" and link it to the corresponding node in the graph.
Once the correct entity links are determined, the mentions in the text are linked to their corresponding entities in the knowledge base or knowledge graph. The performance of the entity resolution system is evaluated using metrics such as precision, recall, and F1-score, comparing the predicted entity links to a ground truth or gold standard. A sample entity resolution is given below for the above-constructed graph. The grey circles indicate the class type resolution for a given entity.

Step 4: Knowledge (Graph) Embedding Generation
As a final step, we will now generate the embeddings for the entities and relationships. There are several models, but for this exercise, we will use TransE.
from pykeen.pipeline import pipeline
# Define pipeline configuration
pipeline_config = {
"dataset": "nations",
"model": {
"name": "TransE",
"embedding_dim": 50
},
"training": {
"num_epochs": 100,
"learning_rate": 0.01
}
}
# Create and run pipeline
result = pipeline(
**pipeline_config,
random_seed=1234,
use_testing_data=True
)
# Access trained embeddings
entity_embeddings = result.model.entity_embeddings
relation_embeddings = result.model.relation_embeddings
# Print embeddings
print("Entity Embeddings:", entity_embeddings)
print("Relation Embeddings:", relation_embeddings)
Here is the output:
Entity Embeddings: [Embedding dimension: (120, 50)]
Relation Embeddings: [Embedding dimension: (120, 50)]
The vectors for the Knowledge Embeddings look as below:
Entity Embeddings:
- "Hawaii": [0.1, -0.2, 0.5, …]
- "United States": [0.3, 0.4, -0.1, …]
- "Barack Obama": [-0.2, 0.6, 0.2, …]
- "Louvre Museum": [-0.5, 0.1, 0.7, …]
Relation Embeddings:
- "is a state in": [0.2, -0.3, 0.1, …]
- "served as": [0.4, 0.2, -0.5, …]
- "is located in": [-0.1, 0.5, 0.3, …]
Part 3: Combining Text and Knowledge (Graph) Emeddings
Before we try to combine the embeddings, let us first understand the value these embeddings bring and validate if they are complimentary at all.
Text embeddings and Knowledge embeddings serve distinct purposes in natural language processing (NLP) and represent different aspects of linguistic and semantic information. Let’s compare and contrast these two types of embeddings based on the provided outputs:

Implementing Combining Text and Knowledge embeddings
This code integrates text embeddings and knowledge embeddings by combining them into a single embedding space and then retrieves relevant passages from a knowledge base based on cosine similarity between the combined embeddings of the query and passages. The output shows the relevant passages along with their similarity scores to the query.
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Sample knowledge embeddings
knowledge_embeddings = { } #initialise the above knowledge embeddgings output
# Sample text embeddings
text_embeddings = { } #initialise the above text embeddgings output
# Consider passages from knowledge base
knowledge_base = {
"Passage 1": "Hawaii is a state in the United States.",
"Passage 2": "Barack Obama served as the 44th president of the United States.",
# Add more passages as needed
}
# Function to combine text and knowledge embeddings
def combine_embeddings(text_emb, know_emb):
combined_emb = {}
for entity, t_emb in text_emb.items():
if entity in know_emb:
combined_emb[entity] = np.concatenate([t_emb, know_emb[entity]])
else:
combined_emb[entity] = t_emb
return combined_emb
# Function to retrieve relevant passages using combined embeddings
def retrieve_passages(query_emb, knowledge_base_emb):
similarities = {}
for passage, kb_emb in knowledge_base_emb.items():
sim = cosine_similarity([query_emb], [kb_emb])[0][0]
similarities[passage] = sim
sorted_passages = sorted(similarities.items(), key=lambda x: x[1], reverse=True)
return sorted_passages
# Example usage
combined_embeddings = combine_embeddings(text_embeddings, knowledge_embeddings)
query = "query"
relevant_passages = retrieve_passages(combined_embeddings[query], knowledge_embeddings)
# Print relevant passages
for passage, similarity in relevant_passages:
print("Passage:", passage)
print("Similarity:", similarity)
print()
The output with similarity is as below:
Passage: Passage 1
Similarity: 0.946943628930774
Passage: Passage 2
Similarity: 0.9397945401928656
We can implement other similarity frameworks and choose one that best works out for the business context.
Conclusion
Using both text embeddings and knowledge embeddings together in a Retrieval-Augmented Generation (RAG) model can enhance the model’s performance and capabilities in several ways:
- Comprehensive Understanding: Text embeddings capture the semantic meaning of individual words or phrases, while knowledge embeddings capture explicit relationships between entities. Through the integration of both types of embeddings, the RAG model achieves a more holistic grasp of the input text and the organized information stored in the knowledge graph.
- Contextual Relevance: Text embeddings offer valuable contextual insights by analyzing word co-occurrences in the input text, while knowledge embeddings provide contextual relevance by examining relationships between entities in the knowledge graph. By combining different types of embeddings, the RAG model is capable of producing responses that are both semantically relevant to the input text and contextually consistent with the structured knowledge.
- Improved Answer Retrieval: Utilizing structured knowledge in the RAG model can significantly improve answer selection, thanks to the integration of knowledge embeddings in the retrieval component. With the utilization of knowledge embeddings for indexing and retrieving relevant passages from a knowledge base, the RAG model has the capability to retrieve responses that are not only more accurate but also more informative.
- Enhanced Answer Generation: Text embeddings enhance the generation component of the RAG model by incorporating a wide range of linguistic features and semantic nuances. Through the integration of text embeddings and knowledge embeddings in the answer generation process, the RAG model is capable of generating responses that exhibit linguistic fluency, semantic relevance, and a strong foundation in structured knowledge.
- Robustness to Ambiguity: By utilizing text embeddings and knowledge embeddings, the RAG model gains increased resilience to ambiguity and variability in natural language. Text embeddings capture the variability and ambiguity present in unstructured text, while knowledge embeddings offer explicit semantic relationships to enhance and clarify the model’s comprehension.
- Effective Knowledge Incorporation:Knowledge embeddings allow the RAG model to seamlessly integrate structured knowledge from a knowledge graph or knowledge base into the generation process. With the integration of knowledge embeddings and text embeddings, the RAG model achieves a seamless fusion of structured knowledge and unstructured text, resulting in responses that are more informative and contextually relevant.
Overall, both text embeddings and knowledge embeddings together in a RAG model allows for a more comprehensive and contextually rich representation of input text and structured knowledge. This integration enhances the model’s performance in answer retrieval, answer generation, robustness to ambiguity, and effective incorporation of structured knowledge, ultimately leading to more accurate and informative responses.
Reference Papers & articles
I used the following reference for my deeper understanding and the trails i shared above in this article.
-
EAGER: Embedding-Assisted Entity Resolution for Knowledge Graphs — 2021
- Introduction to Knowledge Graph Embeddings
- "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Lewis, P., Neumann, M., et al. (2020)
- "KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning" by Z. Hu et al. (2021)
- "Using Graph Neural Networks to Learn Query Intent Representations in Information Retrieval" by Cui, P., et al. (2020)
- "Improving Open-Domain Question Answering with Knowledge-Enhanced Context Embedding" by Yang, Z., et al. (2020)