The world’s leading publication for data science, AI, and ML professionals.

Using Self-Organizing Map To Bolster Retrieval-Augmented Generation In Large Language Models

SOM is proposed to bolster efficient retrieval of LLM context for RAG…

Photo by Werclive 👹  on Unsplash
Photo by Werclive 👹 on Unsplash

Background

Large volumes of data are used to train Large Language Models (LLM) containing millions and billions of model parameters with the goal of text generation, such as text completion, text summarization, language translations, and answering questions. While LLMs develop a knowledge base per se from the training data sources, there is always a cut-off training date post which LLM will not know any newly generated data. For example, the cut-off date for training OpenAI’s GPT-3.5-turbo-instruct LLM is September 2021 (Ref: https://platform.openai.com/docs/models/gpt-3-5-turbo), and as such, GPT-3.5-turbo-instruct LLM may not answer questions on 2022, 2023, or 2024 events accurately. Such data not part of the LLM’s original training data is called external data. Retrieval-Augmented Generation (RAG) is a technique meant to help in such cases by retrieving appropriate information contextual to the input prompt from authorized external sources and augmenting input so that LLM can generate accurate and relevant responses. Effectively, RAG forms the gateway between the LLM and the external data. Such augmentation eliminates the need to retrain or further fine-tune the LLM model.

LLM’s Typical M.O.

LLMs are auto-regressive, generating a new token based on the input prompt tokenized into a sequence of tokens. The generation of the next best token is probability-based and can be expressed as follows:

P( Yn ∣ X0, X1, ... Xn-1, θ )

Essentially, the probability of the newly generated nth token, Yn, is conditioned on the probability of the occurrence of the sequence of n-1 previous tokens X and the learned model parameters θ. It should be noted here that the tokenized input sequence X plays a crucial role in generating the next token. In addition, self-attention mechanisms complement effective auto-regression, where each input token in the sequence computes its representation by attending to and weighing the importance of other tokens in the sequence. Such intricate relationships and dependencies among the tokens in the sequence also enable the LLM to decipher the most probable next-best token that ‘gels well’ with the tokens in the input sequence. The LLM appends the new token to the previous tokens to form a new input sequence and repeats the auto-regressive process until a completion condition is met, such as reaching the maximum token count.

Such a self-attention-driven auto-regression implies that the LLM relies predominantly on the input sequence to generate the next best token. As long as the input sequence helps determine the next-best token through self-attention, the LLM continues in a ‘virtuous’ loop, generating coherent, comprehensible, and relevant outputs. On the contrary, the LLM will start relying on the model parameters if the prompt inputs do not help determine the next best token. In such a case, the model may succeed in generating the next best token if the model has been trained to contain sufficient ‘knowledge’ contextual to the input prompt. Conversely, the model may go into a ‘vicious’ loop, generating non-coherent, incomprehensible, and possibly irrelevant outputs if the prompt inputs pertain to ‘external data’ that the LLM has never been trained on.

Various techniques tackle this issue. Prompt engineering is one of them, where the goal is to address the ‘missing context’ by adjusting the prompt to enhance the context so that the LLM can generate relevant output. RAG is another technique where the goal is to specifically address the ‘missing context due to external data’ by retrieving the most appropriate information contextual to the input prompt from external data sources in an automated manner and augmenting the prompt.

RAG’s Challenge

The primary responsibility of RAG is to search and retrieve data that is contextually related to the input prompt from external data sources such as informational databases, APIs, and other document repositories like Wikipedia. A simple keyword search would not cut it. Instead, RAG requires a semantic search. To facilitate semantic search, the textual information retrieved from external sources is transformed into numerical representations or vectors, commonly called text embeddings, and stored in vector databases. There are various models or algorithms for creating these embeddings from text. The prompt is first transformed into its vector representation to search and retrieve closest matching external data vectors. Vector similarities (or vector distances) are then computed between the prompt vector and the previously stored external data vectors. The most similar or nearest vectors are sorted and filtered using a threshold, and their corresponding textual information is retrieved to augment the prompt’s context. The following conceptual diagram captures the typical interactions between different components for enabling RAG:

Conceptual View of Primary System Component Interactions for Enabling RAG - Image by Author
Conceptual View of Primary System Component Interactions for Enabling RAG – Image by Author

RAG’s challenge is that conducting a vector-driven semantic search is non-trivial and requires significant computational resources because it involves calculating vector similarities or distances against potentially a vast number of vectors within the database. Computing similarity or distance measures for each stored vector from a vast vector database for every input prompt will become infeasible. Besides, the lower the semantic match quality, the lower the LLM’s generative output quality. Therefore, finding a way to conduct the semantic search efficiently becomes crucial.

Solution

Several algorithmic solutions are employed to conduct efficient semantic searches. The typical approach of such algorithms is to group or cluster external data vectors as nearest neighbors and index them by mapping to such clusters. Such indexing is offered as a built-in capability by most vector databases. The matched clusters are first evaluated for the input prompt vector during semantic search. For each evaluated cluster, indexed vectors are selected. Similarities between the input prompt vector and the selected vectors are then computed. The expectation here is that finding the ‘nearest neighbors’ as an intermediate step reduces the number of similarity computations significantly. Finally, the textual information is retrieved corresponding to the most similar or nearest vectors filtered through thresholding. Algorithms such as k-Nearest Neighbors, Ball-of-Radius-R, Locality-Sensitive-Hashing, DBSCAN-Clustering, Tree-Like hierarchies, and Graph-Like hierarchies are typically implemented by vector databases to facilitate semantic searches.

There is no one-size-fits-all solution because different families of algorithms have different trade-offs in terms of memory efficiency, compute efficiency, latency, accuracy, vector dimensionality, dataset sizing, etc. For example, clustering methods enable speed by narrowing the vector space for semantic search, while tree-like or graph-like methods offer improved accuracy for low-dimensional vector data.

Self-Organizing Maps

A Self-Organizing Map (SOM) is a neural network-based dimensionality reduction algorithm developed by Teuvo Kohonen in the 1980s. It is typically used to reduce high-dimensional feature vectors to low-dimensional (typically two-dimensional) feature vectors. The core idea behind SOM is to represent high-dimensional data vectors as specific nodes in a low-dimensional space while retaining the vectors’ topology in the original space. The number of nodes in the low-dimensional space (SOM Nodes) is fixed (hyper-parameter). The exact locations of SOM nodes are evaluated through multiple training epochs. The goal of the iterative training is to adjust the locations of the SOM nodes in the low-dimensional space so that they get mapped to the nearest neighboring vectors in the high-dimensional feature space. In other words, the goal is to map nearest-neighbor vectors in the high-dimensional space to SOM nodes that are also nearest neighbors in the low-dimensional space.

SOM for RAG

In this write-up, I wanted to share notes and findings from my experiments with SOM as a possible algorithm to propel RAG’s semantic search. There are three crucial reasons SOM could be ideal compared to other algorithms:

  1. Vectors’ high dimensionality can become a bottleneck for most other algorithms, such as Trees and Graphs—the so-called curse of dimensionality. On the contrary, SOM is built for dimensionality reduction, and therefore, it can be effectively applied in both high-dimensional and low-dimensional scenarios.
  2. SOM is less sensitive to random variations that may trickle into the original high-dimensional vector space, resulting in noise. Other algorithms can be sensitive to such noise, impacting the way they cluster or group high-dimensional vectors as nearest neighbors. Since SOM employs intermediate SOM nodes in a lower-dimensional vector space which get evaluated as local averages of the mapped vectors from the higher-dimensional space, it effectively reduces noise.
  3. The large size of the external dataset may constrain other algorithms to create semantic vector spaces, which can impact semantic matching’s latency and accuracy. On the other hand, SOM can tackle massive datasets because the number of SOM nodes in the low-dimensional space can be fine-tuned through a hyper-parameter proportional to the underlying dataset size. While training a SOM using a large dataset may take longer, query time mapping remains quicker once training is done.

I demonstrate a simple example of using SOM to conduct RAG’s semantic search to augment the context for question/answer using OpenAI’s GPT-3.5-turbo-instruct LLM. The primary reason for using OpenAI’s GPT-3.5-turbo-instruct LLM is because the cut-off date for training OpenAI’s GPT-3.5-turbo-instruct LLM is September 2021 (Ref: https://platform.openai.com/docs/models/gpt-3-5-turbo), and as such, GPT-3.5-turbo-instruct LLM may not answer questions on 2022, 2023, or 2024 events accurately. Therefore, information about 2022, 2023, 0r 2024 events can become ‘external data’ for OpenAI’s GPT-3.5-turbo-instruct LLM. I used Wikipedia API as the source for such ‘external data’ to fetch events’ information. The following are the steps I used to develop and train the example, along with the sample code.

Step 1: PyTorch-Based Kohonen’s SOM implementation

I utilized PyTorch Tensors to represent vectors and implemented Kohonen’s SOM using PyTorch. This algorithm uses a two-dimensional lattice whose size becomes a hyper-parameter. The algorithm’s mathematical aspects were derived from a well-crafted perspective with lucid explanations mentioned in the following article:

SOM tutorial part 1

The following code snippet shows the Python class for Kohonen’s SOM. The complete code is available at this GitHub location. It’s worth noting that this implementation is standalone, so it can be used outside of RAG example.

class KohonenSOM():
    """
    The code is developed based on the following article:
    http://www.ai-junkie.com/ann/som/som1.html

    The vector and matrix operations are developed using PyTorch Tensors.
    """
    def __init__( ... )
    ...
    def find_topk_best_matching_units( self, data_points : torch.Tensor, topk : int = 1 ) -> List[ List[ int ] ] :
        if len( data_points.size() ) == 1:
            #batching 
            data_points = data_points.view( 1, data_points.shape[0] )

        topk = int( topk )

        distances = self.dist_evaluator( data_points, self.lattice_node_weights )

        topk_best_matching_unit_indexes = torch.topk( distances, topk, dim=1, largest=False ).indices
        topk_best_matching_units = []

        for i in range( data_points.shape[0] ):
            best_matching_unit_indexes = topk_best_matching_unit_indexes[i]
            best_matching_units = [ self.lattice_coordinates[ bmu_index.item() ].tolist() for bmu_index in best_matching_unit_indexes ]
            topk_best_matching_units.append( best_matching_units )

        return topk_best_matching_units

Step 2: SOM-Based Vector Indexer Implementation

The vector indexer is a utility that uses Kohonen’s SOM to train SOM nodes with data vectors from an external dataset. Its primary purpose is to map each data vector to the closest top-k SOM nodes, enabling efficient indexing of the data vectors. The following code snippet shows the train and indexing function of the vector indexer Python class. Its complete code is available at this GitHub location. Although its implementation is currently limited to the example’s needs, it can be extended to meet other requirements.

class SOMBasedVectorIndexer():
    ...

    def train_n_gen_indexes( 
                                self, input_vectors : torch.Tensor, 
                                train_epochs : int = 100 
                           ):
        if self.generated_indexes:
            print( "WARNING: Indexes were already generated. Ignoring the request..." )
            return

        self.som.train( input_vectors, train_epochs )

        topk_bmu_indexes = self.som.find_topk_best_matching_units( input_vectors, topk = self.topk_bmu_for_indexing )

        for idx in tqdm( range( len( topk_bmu_indexes ) ), desc="SOM-Based Indexed Vectors"  ):
            bmu_indexes = topk_bmu_indexes[ idx ]

            for bmu_index in bmu_indexes:
                bmu_index_key = tuple( bmu_index )
                idx_set = self.som_node_idx_map.get( bmu_index_key, set() )
                idx_set.add( idx )
                self.som_node_idx_map[ bmu_index_key ] = idx_set

        self.generated_indexes = True

Step 3: OpenAI Embeddings-Based Text-To-Vector Encoder

The encoder’s primary function is to convert text into vector representations using OpenAI’s text embedding API. It is worth noting that an OpenAI account and API key are required to use the embedding API. Upon opening an account for the first time, OpenAI provides complementary credit grants, which are more than enough to access the API for testing purposes. Below is a code snippet showcasing the batch encode function of the OpenAI encoder Python class. The complete code is available at this GitHub location.

import openai
from openai.embeddings_utils import get_embedding
...
from vector_encoder_parent import VectorEncoder
...

class OpenAIEmbeddingsVectorEncoder( VectorEncoder ):
    def __init__( ... )
    ...
    def encode_batch( self, list_of_text : List[ str ] ) -> torch.Tensor :
        if list_of_text == None or len( list_of_text ) == 0:
            raise ValueError( "ERROR: Required list_of_text is None or empty" )

        list_of_text = [ str( text ) for text in list_of_text ]

        openai.api_key = self.openai_key
        response = openai.Embedding.create(
                                            input = list_of_text,
                                            engine = self.vector_encoder_id
                                          )

        embeddings = [ data["embedding"] for data in response["data"] ] 
        vectors = torch.tensor( embeddings, dtype=torch.float )
        return vectors

Note that the OpenAI vector encoder class extends a generic parent class, ‘VectorEncoder,’ that defines abstract encoding functions to be implemented through inheritance. It is possible to implement other types of vector encoders by inheriting from this parent class for the pluggability of other encoding schemes. The complete code for the parent vector encoder class can be found at this GitHub location.

Step 4: Wikipedia API-Driven DataSource Implementation

This utility class is designed to encapsulate the data retrieval logic that integrates with Wikipedia API. Its main function is to fetch events for a specified array of calendar years, format the retrieved events, and load them into a Pandas dataframe. The code snippet below captures the primary function of the utility class, while the complete code is available at this GitHub location.

import requests
import pandas as pd
from dateutil.parser import parse
...
class WikiEventsDataSource():
    ...
    def fetch_n_prepare_data( self ):
        if self.fetched:
            print( "WARNING: Wiki events for the specified years already fetched. Ignoring the request..." )
            return

        main_df = pd.DataFrame()

        for year in self.event_years_to_fetch:
            wiki_api_params = {
                                "action": "query", 
                                "prop": "extracts",
                                "exlimit": 1,
                                "titles": year,
                                "explaintext": 1,
                                "formatversion": 2,
                                "format": "json"
                              }

            response = requests.get( "https://en.wikipedia.org/w/api.php", params=wiki_api_params )
            response_dict = response.json()

            df = pd.DataFrame()
            df[ "text" ] = response_dict["query"]["pages"][0]["extract"].split("n")
            df = self.__clean_df__( df, year )

            main_df = pd.concat( [ main_df, df ] )

        self.df = main_df.reset_index(drop=True)
        self.fetched = True

Step 5: SOM-Based RAG Utility Implementation

The SOM-based RAG utility is a crucial element of the example implementation. It utilizes the vector encoder, indexer, and data source to implement the core logic for the underlying semantic search. The complete code for the SOM-based RAG utility is available at this GitHub location.

The utility implements three primary functions. The first function is to load data from an external data source and encode it into vectors, as shown in the following code snippet.

...
from vector_encoder_parent import VectorEncoder
from vector_indexer import SOMBasedVectorIndexer

class SOM_Based_RAG_Util():
    ...
    def load_n_vectorize_data( self, data_source ):
        if self.data_loaded_n_vectorized:
            print( "WARNING: Data already loaded and vectorized. Ignoring the request..." )
            return

        data_source.fetch_n_prepare_data()
        self.df = data_source.get_data()

        vectors = None

        for i in tqdm( range(0, len(self.df), self.vectorize_batch_size ), desc="Vectorized Data Batch" ):
            list_of_text = self.df.iloc[ i:i+self.vectorize_batch_size ]["text"].tolist()
            batch_encoded_vectors = self.vector_encoder.encode_batch( list_of_text )

            if vectors == None:
                vectors = batch_encoded_vectors
            else:
                vectors = torch.cat( [ vectors, batch_encoded_vectors], dim=0 )

        self.vectors = vectors.to( self.device )
        self.data_loaded_n_vectorized = True

The second function is to train the SOM-based indexer to construct Kohonen’s SOM nodes and then index the data vectors, as shown in the following code snippet.

def train_n_index_data_vectors( self, train_epochs : int = 100  ):
        if not self.data_loaded_n_vectorized:
            raise ValueError( "ERROR: Data not loaded and vectorized." )

        if self.data_vectors_indexed:
            print( "WARNING: Data vectors already indexed. Ignoring the request..." )
            return

        self.vector_indexer.train_n_gen_indexes( self.vectors, train_epochs )
        self.data_vectors_indexed = True

The third function is to find similar information from the previously stored external dataset based on a query text. This function uses the encoder to convert the query text into a vector and then searches through the SOM-based indexer for the most likely matches. This function then calculates the similarity between the query vector and the discovered data vectors using Cosine similarity or another specified similarity evaluator. Finally, this function filters the data vectors whose similarities are greater than or equal to the specified similarity threshold. The following code snippet captures the function implementation.

def find_semantically_similar_data( self, query: str, sim_evaluator = None, sim_threshold : float = 0.8  ):
        if not self.data_vectors_indexed:
            raise ValueError( "ERROR: Data vectors not indexed." )

        if query == None or len( query.strip() ) == 0:
            raise ValueError( "ERROR: Required query text is not specified." )

        sim_threshold = float( sim_threshold )

        if sim_evaluator == None:
            sim_evaluator = nn.CosineSimilarity(dim=0, eps=1e-6)

        query_vector = self.vector_encoder.encode( query )
        query_vector = query_vector.view( self.vector_encoder.get_encoded_vector_dimensions() )
        query_vector = query_vector.to( self.device )

        nearest_indexes = self.vector_indexer.find_nearest_indexes( query_vector )
        nearest_indexes = nearest_indexes[0]

        sim_scores = []

        for idx in nearest_indexes:
            data_vector = self.vectors[ idx ]
            data_vector = data_vector.view( self.vector_encoder.get_encoded_vector_dimensions() )

            sim_score = sim_evaluator( query_vector, data_vector )

            if sim_score >= sim_threshold:
                sim_score_tuple = (idx, sim_score.item() )
                sim_scores.append( sim_score_tuple )

        sim_scores.sort( key = lambda x: x[1], reverse=True )

        semantically_similar_data = [ 
                                        { 
                                            'text': self.df[ 'text' ][ idx ],
                                            'sim_score' : sim_score
                                        } for idx, sim_score in sim_scores
                                    ]

        return semantically_similar_data

An example output from a semantic search by SOM-based RAG utility function is shown below:

An Example Semantic Search Output - Image by Author
An Example Semantic Search Output – Image by Author

Step 6: Abstract Question/Answer ChatBot And Its OpenAI-Based Implementation

An abstract ‘QuestionAnswerChatBot’ Python class is developed to facilitate chatbot-like implementations. It augments the prompted question by using a standard instruction template and populating it with contextually similar information retrieved from the RAG utility.

The specified maximum number of new tokens limits the text size for context augmentation, while token counting is deferred to underlying implementations. In LLM economics, tokens are like currency. Each token the model processes requires computational resources – memory, processing power, and time. Thus, the more tokens an LLM has to process, the greater the computational cost.

Finally, this class delegates prompting of the LLM model to the underlying implementation once the QA instruction has been populated. The following code snippet captures the primary function; the complete code is available at this GitHub location.

from abc import ABC, abstractmethod
import torch
import math

class QuestionAnswerChatBot( ABC ):
    ...
    def find_answer_to_question( self, question : str, sim_threshold = 0.68, max_new_tokens : int = 5 ):
        if question == None or len( question.strip() ) == 0:
            raise ValueError( "ERROR: Required question is not specified" )

        sim_threshold = float( sim_threshold )
        max_new_tokens = int( max_new_tokens )

        qa_instruction = self.get_qa_instruction( question, sim_threshold = sim_threshold )

        answer_text = self.__get_answer_text__( qa_instruction, max_new_tokens = max_new_tokens )
        answer_text = self.__clean_answer_text__( qa_instruction, answer_text )

        return answer_text
    ...
    def __qa_template__( self ):
        qa_template = """Context: 

    {}

    ---

    Question: {}
    Answer:"""
        return qa_template

The Python class ‘OpenAIQuestionAnswerChatBot’ extends the abstract ‘QuestionAnswerChatBot’ and implements the chatbot functionality using the OpenAI LLM API. The following code snippet shows the class’s primary function. The complete code is available at this GitHub location.

import openai
import tiktoken
from qa_chatbot import QuestionAnswerChatBot

class OpenAIQuestionAnswerChatBot( QuestionAnswerChatBot ):
    ...
    def __get_answer_text__( self, qa_instruction : str, max_new_tokens : int = 5 ) -> str :
        openai.api_key = self.openai_key

        basic_answer = openai.Completion.create(
                                                    model = self.openai_model_name,
                                                    prompt = qa_instruction, 

                                               )

        answer_text = basic_answer[ "choices" ][0][ "text" ]
        return answer_text

    def __token_count__( self, text : str ):    
        return len( self.tokenizer.encode( text ) )

The following is an example of how a prompted question gets augmented with context using similar information retrieved through semantic search:

An Example Context Augmented Question Prompt - Image by Author
An Example Context Augmented Question Prompt – Image by Author

Step 7: Sample Questions for Testing

The following are sample questions for testing the RAG using OpenAI’s GPT-3.5-turbo-instruct LLM. They were developed to ensure that their answers pertain to events that occurred in 2022, 2023, and 2024.

sample_questions = [
                        "Who won the 2022 soccer world cup?",
                        "When did Sweden join NATO?",
                        "Who joined NATO in 2023?",
                        "Who joined NATO in 2024?",
                        "Which is the 31st member of NATO?",
                        "Which is the 32nd member of NATO?",
                        "Who won the Cricket World Cup in 2023?",
                        "Who defeated India in Cricket World Cup final in 2023?",
                        "Name the former prime minister of Japan that was assassinated in 2022?",
                        "When did Chandrayaan-3 land near the south pole of the Moon?",
                        "Where did Chandrayaan-3 land on the Moon?",
                        "Who acquired Twitter in 2022?",
                        "Who owns Twitter?",
                        "Who acquired Activision Blizzard in 2023?"
                   ]

Step 8: Putting Everything Together

The complete Jupyter notebook that brings all the components together can be found at this GitHub location. The following code snippet shows the initiation of the main OpenAI-based QA chatbot. Note that OpenAI’s text embedding algorithm, "text-embedding-ada-002," is used for vector encoding. Likewise, the chatbot uses OpenAI’s tokenizer, "cl100k_base," to count the tokens to limit the contextual text to augment the question prompt by leveraging the inbuilt functions of the TikToken Python library.

openai_vector_encoder_id = "text-embedding-ada-002"
openai_encoded_vector_dimensions = 1536
openai_tokenizer_name = "cl100k_base" 
openai_model_name = "gpt-3.5-turbo-instruct"

vector_encoder = OpenAIEmbeddingsVectorEncoder( openai_encoded_vector_dimensions, openai_vector_encoder_id, openai_key )

event_years_to_fetch = [ 2022, 2023, 2024 ]
data_source = WikiEventsDataSource( event_years_to_fetch  )
...
som_driven_rag_util = SOM_Based_RAG_Util( 
                                            vector_encoder = vector_encoder,
                                            som_lattice_height = 20,
                                            som_lattice_width = 30,
                                            learning_rate = 0.3,
                                            topk_bmu_for_indexing = 10,
                                            device = device
                                        )
...
openai_chatbot = OpenAIQuestionAnswerChatBot( 
                                                vector_db_util = som_driven_rag_util,
                                                openai_tokenizer_name = openai_tokenizer_name,
                                                openai_model_name = openai_model_name,
                                                openai_key = openai_key,
                                                question_input_max_token_count = 100,
                                                context_trim_percent = 0.1,
                                                device = device
                                            )

The following sequence diagrams help visualize all the component interactions during the initialization and actual question/answering phases.

Interactions of Various Components During Initialization - Image by Author
Interactions of Various Components During Initialization – Image by Author
Interactions of Various Components During Question/Answering - Image by Author
Interactions of Various Components During Question/Answering – Image by Author

Findings

The following image captures the question/answers from OpenAI’s GPT-3.5-turbo-instruct LLM with and without context augmentation.

OpenAI's GPT-3.5-turbo-instruct LLM's Answers With and Without Context Augmentation - Image by Author
OpenAI’s GPT-3.5-turbo-instruct LLM’s Answers With and Without Context Augmentation – Image by Author

Understandably, the LLM finds it challenging to answer questions about events that occurred after its September 2021 cut-off date. In most cases, it clearly responds that the questions are from a future time relative to its training cut-off date. On the contrary, the same LLM answers all the questions accurately to perfection when the context of the prompted questions is augmented with relevant information from years 2022, 2023, and 2024 retrieved from Wikipedia. The real credit here goes to the SOM that formed the basis for RAG’s semantic search to retrieve and augment the prompted question’s context with relevant information.

Suggested Next Steps

While the above example served as a proof-of-concept to assess the suitability of a Self-Organizing Map to enable Retrieval-Augmented Generation of text by an LLM, a more comprehensive benchmarking is suggested to evaluate its performance in comparison to other algorithms using a much larger external dataset, where performance is measured in terms of the quality of LLM outputs (something like perplexity + accuracy). In addition, since the current example enables a pluggable framework, it is suggested that other open-source and free QA LLMs be used to conduct such benchmarking to minimize the LLM usage expenses.

To help run the example in local environments, I included the ‘requirements.txt’ file, which contains various versions of Python libraries I used in my environment to run and test the above example. This file is available at this GitHub location.

I conclude by promising to share my findings in a separate write-up if I conduct any such benchmarks. Please stay tuned!!


References

SOM tutorial part 1

Understanding Self-Organising Map Neural Network with Python Code

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

What Is Retrieval-Augmented Generation aka RAG?

https://www.sciencedirect.com/topics/engineering/self-organizing-map

https://platform.openai.com/docs/models/gpt-3-5-turbo

https://platform.openai.com/docs/guides/text-generation/chat-completions-api


Related Articles