If you are not a Medium member, you can read the full story at this link.
After the launch of ChatGPT and the following surge of Large Language Models (LLMs), their inherent limitations of hallucination, knowledge cutoff date, and the inability to provide organization- or person-specific information soon became evident and were seen as major drawbacks. To address these issues, Retrieval Augment Generation (RAG) methods soon gained traction which integrate external data to LLMs and guide their behavior to answer questions from a given knowledge base.
Interestingly, the first paper on RAG was published in 2020 by researchers from Facebook AI Research (now Meta AI), but it was not until the advent of ChatGPT that its potential was fully realized. Since then, there has been no stopping. More advanced and complex RAG frameworks were introduced which not only improved the accuracy of this technology but also enabled it to deal with multimodal data, expanding its potential for a wide range of applications. I wrote on this topic in detail in the following articles, specifically discussing contextual multimodal RAG, multimodal AI search for business applications, and information extraction and matchmaking platforms.
Integrating Multimodal Data into a Large Language Model
With the expanding landscape of RAG technology and the emerging data access requirements, it was realized that the functionality of a retriever-only RAG, which answers questions from a static knowledge base, can be extended by integrating other diverse knowledge sources and tools such as:
- Multiple databases (e.g., knowledge bases comprising vector databases and knowledge graphs)
- Real-time web search to access recent information
- External APIs to collect specific data such as stock market trends or data from company-specific tools like Slack channels or email accounts
- Tools for tasks like data analysis, report writing, literature review, and people search, etc.
- Comparing and consolidating information from multiple sources.

To achieve this, a RAG should be able to select the best knowledge source and/or tool based on the query. The emergence of AI agents introduced the idea of "agentic RAG" which could select the best course of action based on the query.
In this article, we will develop a specific agentic RAG application, called Smart Business Guide (SBG) – the first version of the tool that is part of our ongoing project called UPBEAT, funded by Interreg Central Baltic. The project is focused on upskilling immigrants in Finland and Estonia for Entrepreneurship and business planning using AI. SBG is one of the tools intended to be used in this project’s upskilling process. This tool focuses on providing precise and quick information from authentic sources to people intending to start a business, or those already doing business.
The SBG’s agentic RAG comprises:
- Business and entrepreneurship guides as a knowledge base containing information about business planning, entrepreneurship, company registration, taxation, business ideas, rules and regulations, business opportunities, licenses and permits, business guidelines, and others.
- Web search to fetch recent information with sources.
- Knowledge extraction tools to fetch information from trusted sources. This information includes contacts of relevant authorities, recent taxation rules, recent business registration rules, and recent licensing regulations.
What is special about this agentic RAG?
- Option to select different open-source models (Llama, Mistral, Gemma) ** as well as proprietary models _(gpt-4o, gpt-4o-min_i) in the entire agentic workflow. The open-source models do not run locally and hence do not require a powerful, expensive computing machine. Instead, they run on Groq Cloud’s platform with a free API. And yes, this makes it a cost-fre**e agentic RAG. The GPT models can also be selected with an OpenAI’s API key.
- Options to enforce knowledge base search, web search, and hybrid search.
- Grading of retrieved documents for improving response quality, and intelligently invoking web search based on grading.
- Options to select response type: concise, moderate, or explanatory.
Specifically, the article is structured around the following topics:
- Parsing data to construct the knowledge base using LlamaParse
- Developing an agentic workflow using LangGraph.
- Developing an advanced agentic RAG (hereinafter called Smart Business Guide or SBG) using free, open-source models
The whole code of this application can be found on GitHub.
The application code is structured in two .py files: _agenticrag.py which implements the entire agentic workflow, and app.py which implements the Streamlit graphical user interface.
Let’s dive into it.
Constructing Knowledge Base with LlamaParsing and LangChain
The knowledge base of the SBG comprises authentic business and entrepreneurship guides published by Finnish agencies. Since these guides are voluminous and finding a required piece of information from them is not trivial, the purpose is to develop an agentic RAG that could not only provide precise information from these guides but can also augment them with a web search and other trusted sources in Finland for updated information.
LlamaParse is a genAI-native document parsing platform built with LLMs and for LLM use cases. I have explained the use of LlamaParse in the articles I cited above. This time, I parsed the documents directly at LlamaCloud. LlamaParse offers 1000 free credits per day. The use of these credits depends on the parsing mode. For text-only PDF, ‘Fast‘ mode (1 credit / 3 pages) works well which skips OCR, image extraction, and table/heading identification. There are other more advanced modes available with a higher number of credit points per page. I selected the ‘premium‘ mode which performs OCR, image extraction, and table/heading identification and is ideal for complex documents with images.
I defined the following parsing instructions.
You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format.
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text.
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7].
Include the document name and page number at the start and end of each extracted page.
The parsed documents were downloaded in markdown format from LlamaCloud. The same parsing can be done through LlamaCloud API as follows.
import os
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
# Define parsing instructions
parsing_instructions = """
Extract the text from the document using proper structure.
"""
def save_to_markdown(output_path, content):
"""
Save extracted content to a markdown file.
Parameters:
output_path (str): The path where the markdown file will be saved.
content (list): The extracted content to be saved.
"""
with open(output_path, "w", encoding="utf-8") as md_file:
for document in content:
# Extract the text content from the Document object
md_file.write(document.text + "nn") # Access the 'text' attribute
def extract_document(input_path):
# Initialize the LlamaParse parser
parsing_instructions = """You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format.
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text.
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7].
Include the document name and page number at the start and end of each extracted page.
"""
parser = LlamaParse(
result_type="markdown",
parsing_instructions=parsing_instructions,
premium_mode=True,
api_key=LLAMA_CLOUD_API_KEY,
verbose=True
)
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
input_path, file_extractor=file_extractor
).load_data()
return documents
input_path = r"C:Usersh02317Downloadsdocs" # Replace with your document path
output_file = r"C:Usersh02317Downloadsextracted_document.md" # Output markdown file name
# Extract the document
extracted_content = extract_document(input_path)
save_to_markdown(output_file, extracted_content)
Here is an example page from the guide Creativity and Business by Pikkala, A. et al., (2015) ("free to copy for non-commercial private or public use with attribution").

Here is the parsed output of this page. LlamaParse efficiently extracted information from all structures in the page. The notebook shown in the page is in image format.
[Creativity and Business, page 8]
# How to use this book
1. The book is divided into six chapters and sub-sections dealing with different topics. You can read the book through one chapter and topic at a time, or you can use the checklist of the table of contents to select sections on topics in which you need more information and support.
2. Each section opens with a creative entrepreneur's thought on the topic.
3. The introduction gives a brief description of the topic.
4. Each section contains exercises that help you reflect on your own skills and business idea and develop your business idea further.
## What is your business idea
"I would like to launch
a touring theatre company."
Do you have an idea about a product or service you would like
to sell? Or do you have a bunch of ideas you have been mull-
ing over for some time? This section will help you get a better
understanding about your business idea and what competen-
cies you already have that could help you implement it, and
what types of competencies you still need to gain.
### EXTRA
Business idea development
in a nutshell
I found a great definition of what business idea development
is from the My Coach online service (Youtube 27 May 2014).
It divides the idea development process into three stages:
the thinking - stage, the (subconscious) talking - stage, and the
customer feedback stage. It is important that you talk about
your business idea, as it is very easy to become stuck on a
particular path and ignore everything else. You can bounce
your idea around with all sorts of people: with a local business
advisor; an experienced entrepreneur; or a friend. As you talk
about your business idea with others, your subconscious will
start working on the idea, and the feedback from others will
help steer the idea in the right direction.
### Recommended reading
Taivas + helvetti
(Terho Puustinen & Mika Mäkeläinen:
One on One Publishing Oy 2013)
### Keywords
treasure map; business idea; business idea development
## EXERCISE: Identifying your personal competencies
Write down the various things you have done in your life and think what kind of competencies each of these things has
given you. The idea is not just to write down your education,
training and work experience like in a CV; you should also
include hobbies, encounters with different types of people, and any life experiences that may have contributed to you
being here now with your business idea. The starting circle can be you at any age, from birth to adulthood, depending
on what types of experiences you have had time to accumulate. The final circle can be you at this moment.
PERSONAL CAREER PATH
SUPPLEMENTARY
PERSONAL DEVELOPMENT
(e.g. training courses;
literature; seminars)
Fill in the
"My Competencies"
section of the
Creative Business
Model Canvas:
5. Each section also includes an EXTRA box with interesting tidbits about the topic at hand.
6. For each topic, tips on further reading are given in the grey box.
7. The second grey box contains recommended keywords for searching more information about the topic online.
8. By completing each section of the one-page business plan or "Creative Business Model Canvas" (page 74),
by the end of the book you will have a complete business plan.
9. By writing down your business start-up costs (e.g. marketing or logistics) in the price tag box of each section,
by the time you get to the Finance and Administration section you will already know your start-up costs
and you can enter them in the receipt provided in the Finance and Administration section (page 57).
This book is based on Finnish practices. The authors and the publisher are not responsible for the applicability of factual information to other
countries. Readers are advised to check country-specific information on business structures, support organisations, taxation, legislation, etc.
Factual information about Finnish practices should also be checked in case of differing interpretations by authorities.
[Creativity and Business, page 8]
The parsed markdown documents are then split into chunks using LangChain’s RecursiveCharacterTextSplitter with CHUNK_SIZE = 3000 and CHUNK_OVERLAP = 200.
def staticChunker(folder_path):
docs = []
print(f"Creating chunks. CHUNK_SIZE: {CHUNK_SIZE}, CHUNK_OVERLAP: {CHUNK_OVERLAP}")
# Loop through all .md files in the folder
for file_name in os.listdir(folder_path):
if file_name.endswith(".md"):
file_path = os.path.join(folder_path, file_name)
print(f"Processing file: {file_path}")
# Load documents from the Markdown file
loader = UnstructuredMarkdownLoader(file_path)
documents = loader.load()
# Add file-specific metadata (optional)
for doc in documents:
doc.metadata["source_file"] = file_name
# Split loaded documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
chunked_docs = text_splitter.split_documents(documents)
docs.extend(chunked_docs)
return docs
Subsequently, a vectorstore is created in Chroma database using an embedding model such as open-source all-MiniLM-L6-v2 model or OpenAI’s text-embedding-3-large.
def load_or_create_vs(persist_directory):
# Check if the vector store directory exists
if os.path.exists(persist_directory):
print("Loading existing vector store...")
# Load the existing vector store
vectorstore = Chroma(
persist_directory=persist_directory,
embedding_function=st.session_state.embed_model,
collection_name=collection_name
)
else:
print("Vector store not found. Creating a new one...n")
docs = staticChunker(DATA_FOLDER)
print("Computing embeddings...")
# Create and persist a new Chroma vector store
vectorstore = Chroma.from_documents(
documents=docs,
embedding=st.session_state.embed_model,
persist_directory=persist_directory,
collection_name=collection_name
)
print('Vector store created and persisted successfully!')
return vectorstore
Creating Agentic Workflow
An AI agent is the combination of the workflow and the decision-making logic to intelligently answer questions or perform other complex tasks that need to be broken down into simpler sub-tasks.
I used LangGraph to design a workflow for our AI agent for the sequence of actions or decisions in the form of a graph. Our agent has to decide whether to answer the question from the vector database (knowledge base), web search, hybrid search, or by using a tool.
In my following article, I explained the process of creating an agentic workflow using LangGraph.
How to Develop a Free AI Agent with Automatic Internet Search
We need to create graph nodes that represent a workflow to make decisions (e.g., web search or vector database search). The nodes are connected by edges which define the flow of decisions and actions (e.g., what is the next state after retrieval). The graph state keeps track of the information as it moves through the graph so that the agent uses the correct data for each step.
The entry point in the workflow is a router function which determines the initial node to execute in the workflow by analyzing the user’s query. The entire workflow contains the following nodes.
- retrieve: Fetches semantically similar chunks of information from the vectorstore.
- _grade_documents_: Grades the relevance of retrieved chunks based on the user’s query.
- _route_after_grading_: Based on the grading, determines whether to genreate a response with the retrieved documents or proceed to web search.
- websearch: Fetches information from web sources using Tavily search engine’s API.
- generate: Generates a response to the user’s query using the provided context (information retrieved from vector store and/or web search).
- _get_contact_tool_: Fetches contact information from predefined trusted URLs related to Finnish Immigration services.
- _get_tax_info_: Fetches tax-related information from predefined trusted URLs.
- _get_registration_info_: Fetches details on company registration processes in Finland from predefined trusted URLs.
- _get_licensing_info_: Fetches information about licenses and permits required for starting a business in Finland.
- _hybrid_search_: Combines document retrieval and internet search results to provide a broader context for answering the query.
- unrelated: Handles questions unrelated to the workflow’s focus
Here are the edges in the workflow.
- _retrieve → grade_documents_: Retrieved documents are sent for grading.
- _grade_documents → websearch_: web search is invoked if the retrieved documents are deemed irrelevant.
- _grade_documents → generate_: Proceeds to response generation if the retrieved documents are relevant.
- websearch → generate: Passes the results of the web search for response generation.
- _get_contact_tool, get_taxinfo, _get_registrationinfo, _get_licensinginfo → generate: The edges from these four tools to generate node pass the fetched information from specific trusted sources for response generation.
- _hybridsearch → generate: Passes the combined results (vectorstore + websearch) for response generation.
- unrelated → generate: Provides a fallback response for unrelated questions.
A graph state structure acts as a container for maintaining the state of the workflow and includes the following elements:
- question: The user’s query or input that drives the workflow.
- generation: The final generated response to the user’s query, which is populated after processing.
- _web_search_needed_: A flag indicating whether a web search is required based on the relevance of retrieved documents.
- documents: A list of retrieved or processed documents that are relevant to the query.
- _answer_style_: Specifies the desired style of the answer, such as "Concise," "Moderate," or "Explanatory".
The graph state structure is defined as follows:
class GraphState(TypedDict):
question: str
generation: str
web_search_needed: str
documents: List[Document]
answer_style: str
Following router function analyzes the query and routes it to a relevant node for processing. A chain is created comprising a prompt to select a tool/node from a tool selection dictionary and the query. The chain invokes a router LLM to select the relevant tool.
def route_question(state):
question = state["question"]
# check whether one of these two options has been selected in the user interface
hybrid_search_enabled = state.get("hybrid_search", False)
internet_search_enabled = state.get("internet_search", False)
if hybrid_search_enabled:
return "hybrid_search"
if internet_search_enabled:
return "websearch"
tool_selection = {
"get_tax_info": (
"Questions specifically related to tax matters, including current tax rates, taxation rules, taxable incomes, tax exemptions, the tax filing process, or similar topics. "
),
"get_contact_tool": (
"Questions specifically asking for the contact information of the Finnish Immigration Service (Migri). "
),
"get_registration_info": (
"Questions specifically about the process of company registration."
"This excludes broader questions about starting a business or similar processes."
),
"get_licensing_info": (
"Questions related to licensing, permits, and notifications required for starting a business, especially for foreign entrepreneurs. "
"This excludes questions about residence permits or licenses."
),
"websearch": (
"Questions related to residence permits, visas, moving to Finland, or those requiring current statistics or real-time information. "
),
"retrieve": (
"Questions broadly related to business, business planning, business opportunities, startups, entrepreneurship, employment, unemployment, pensions, insurance, social benefits, and similar topics"
"This includes questions about specific business opportunities (e.g., for specific expertise, area, topic) or suggestions. "
),
"unrelated": (
"Questions not related to business, entrepreneurship, startups, employment, unemployment, pensions, insurance, social benefits, or similar topics, "
"or those related to other countries or cities instead of Finland."
)
}
SYS_PROMPT = """Act as a router to select specific tools or functions based on user's question.
- Analyze the given question and use the given tool selection dictionary to output the name of the relevant tool based on its description and relevancy with the question.
The dictionary has tool names as keys and their descriptions as values.
- Output only and only tool name, i.e., the exact key and nothing else with no explanations at all.
- For questions mentioning any other country except Finland, or any other city except a Finnish city, output 'unrelated'.
"""
# Define the ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[
("system", SYS_PROMPT),
("human", """Here is the question:
{question}
Here is the tool selection dictionary:
{tool_selection}
Output the required tool.
"""),
]
)
# Pass the inputs to the prompt
inputs = {
"question": question,
"tool_selection": tool_selection
}
# Invoke the chain
tool = (prompt | st.session_state.router_llm | StrOutputParser()).invoke(inputs)
tool = re.sub(r"['"`]", "", tool.strip()) # Remove backslashes and extra spaces
if not "unrelated" in tool:
print(f"Invoking {tool} tool through {st.session_state.router_llm.model_name}")
if "websearch" in tool:
print("I need to get recent information from this query.")
return tool
The questions not relevant to the workflow are routed to _handleunrelated node which provides a fallback response through generate node.
def handle_unrelated(state):
question = state["question"]
documents = state.get("documents",[])
response = "I apologize, but I'm designed to answer questions specifically related to business and entrepreneurship in Finland. Could you please rephrase your question to focus on these topics?"
documents.append(Document(page_content=response))
return {"generation": response, "documents": documents, "question": question}
The entire workflow is depicted in the following figure.

Retrieval and Grading
The retrieve node invokes the retriever with the question to fetch relevant chunks of information from the vector store. These chunks ("documents") are sent to the _gradedocuments node to grade their relevancy. Based on the graded chunks ("_filtereddocs"), the _route_aftergrading node decides whether to proceed to generation with the retrieved information or to invoke web search. The helper function _initialize_graderchain initializes the grader chain with a prompt guiding the grader LLM to assess the relevancy of each chunk. The _gradedocuments node analyzes each chunk to determine whether it is relevant to the question. For each chunk, it outputs "Yes" or "No" depending whether the chunk is relevant to the question.
def initialize_grader_chain():
# Data model for LLM output format
class GradeDocuments(BaseModel):
"""Binary score for relevance check on retrieved documents."""
binary_score: str = Field(
description="Documents are relevant to the question, 'yes' or 'no'"
)
# LLM for grading
structured_llm_grader = st.session_state.grader_llm.with_structured_output(GradeDocuments)
# Prompt template for grading
SYS_PROMPT = """You are an expert grader assessing relevance of a retrieved document to a user question.
Follow these instructions for grading:
- If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant.
- Your grade should be either 'Yes' or 'No' to indicate whether the document is relevant to the question or not."""
grade_prompt = ChatPromptTemplate.from_messages([
("system", SYS_PROMPT),
("human", """Retrieved document:
{documents}
User question:
{question}
"""),
])
# Build grader chain
return grade_prompt | structured_llm_grader
def grade_documents(state):
question = state["question"]
documents = state.get("documents", [])
filtered_docs = []
if not documents:
print("No documents retrieved for grading.")
return {"documents": [], "question": question, "web_search_needed": "Yes"}
print(f"Grading retrieved documents with {st.session_state.grader_llm.model_name}")
for count, doc in enumerate(documents):
try:
# Evaluate document relevance
score = st.session_state.doc_grader.invoke({"documents": [doc], "question": question})
print(f"Chunk {count} relevance: {score}")
if score.binary_score == "Yes":
filtered_docs.append(doc)
except Exception as e:
print(f"Error grading document chunk {count}: {e}")
web_search_needed = "Yes" if not filtered_docs else "No"
return {"documents": filtered_docs, "question": question, "web_search_needed": web_search_needed}
def route_after_grading(state):
web_search_needed = state.get("web_search_needed", "No")
print(f"Routing decision based on web_search_needed={web_search_needed}")
if web_search_needed == "Yes":
return "websearch"
else:
return "generate"
def retrieve(state):
print("Retrieving documents")
question = state["question"]
documents = st.session_state.retriever.invoke(question)
return {"documents": documents, "question": question}
Web and Hybrid Search
The _websearch node is reached either by _route_aftergrading node when no relevant chunks are found in the retrieved information, or directly by _routequestion node when either _internet_searchenabled state flag is "True" (selected by the radio button in the user interface), or the router function decides to route the query to _websearch to fetch recent and more relevant information.
Tavily search engine’s free API can be obtained by creating an account at their website. The free plan offers 1000 credit points per month. Tavily search results are appended to the state variable "document" which is then passed to generate node with the state variable "question".
Hybrid search combines the results of both retriever and Tavily search and populates "document" state variable, which is passed to generate node with "question" state variable.
def web_search(state):
if "tavily_client" not in st.session_state:
st.session_state.tavily_client = TavilyClient()
question = state["question"]
question = re.sub(r'bw+|Internet searchb', '', question).strip()
question = question + " in Finland"
documents = state.get("documents", [])
try:
print("Invoking internet search...")
search_result = st.session_state.tavily_client.get_search_context(
query=question,
search_depth="advanced",
max_tokens=4000
)
# Handle different types of results
if isinstance(search_result, str):
web_results = search_result
elif isinstance(search_result, dict) and "documents" in search_result:
web_results = "n".join([doc.get("content", "") for doc in search_result["documents"]])
else:
web_results = "No valid results returned by TavilyClient."
web_results_doc = Document(page_content=web_results)
documents.append(web_results_doc)
except Exception as e:
print(f"Error during web search: {e}")
# Ensure workflow can continue gracefully
documents.append(Document(page_content=f"Web search failed: {e}"))
return {"documents": documents, "question": question}
def hybrid_search(state):
question = state["question"]
print("Invoking retriever...")
vector_docs = st.session_state.retriever.invoke(question)
web_docs = web_search({"question": question})["documents"]
# Add headings to distinguish between vector and web search results
vector_results = [Document(page_content="Smart guide results:nn" + doc.page_content) for doc in vector_docs]
web_results = [Document(page_content="nnInternet search results:nn" + doc.page_content) for doc in web_docs]
combined_docs = vector_results + web_results
return {"documents": combined_docs, "question": question}
Invoking Tools
The tools used in this agentic workflow are the scrapping functions to fetch information from predefined trusted URLs. The difference between Tavily and these tools is that Tavily performs a broader internet search to bring results from diverse sources. Whereas, these tools use Python’s Beautiful Soup web scrapping library to extract information from trusted sources (predefined URLs). In this way, we make sure that the information regarding certain queries is extracted from known, trusted sources. In addition, this information retrieval is completely free.
Here is how _get_taxinfo node works with some helper functions. The other tools (nodes) of this type also work in the same way.
# Helper function to remove unwanted tags
def remove_tags(soup):
for element in soup(["script", "style", "header", "footer", "nav", "aside", "noscript"]):
element.decompose()
# Extract text while preserving structure
content = ""
for element in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'li']):
text = element.get_text(strip=True)
if element.name.startswith('h'):
level = int(element.name[1])
content += '#' * level + ' ' + text + 'nn' # Markdown-style headings
elif element.name == 'p':
content += text + 'nn'
elif element.name == 'li':
content += '- ' + text + 'n'
return content
# Helper function to fetch and return information from predefined URLs.
def get_info(URLs):
combined_info = ""
for url in URLs:
try:
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
combined_info += "URL: " + url + ": " + remove_tags(soup) + "nn"
else:
combined_info += f"Failed to retrieve information from {url}nn"
except Exception as e:
combined_info += f"Error fetching URL {url}: {e}nn"
return combined_info
# Tool or node to return updated tax-related information from predefined URLs
def get_tax_info(state):
"""
Execute the 'get_contact_info' tool to fetch information.
"""
tax_rates_url = [
'https://www.vero.fi/en/businesses-and-corporations/taxes-and-charges/vat/rates-of-vat/',
'https://www.expat-finland.com/living_in_finland/tax.html?utm_source=chatgpt.com',
'https://finlandexpat.com/tax-in-finland/?utm_source=chatgpt.com'
]
question = state["question"]
documents = state.get("documents", [])
try:
tax_info = get_info(tax_rates_url)
web_results_doc = Document(page_content=tax_info)
documents.append(web_results_doc)
return {
"generation": tax_info,
"documents": documents,
"question": question
}
except Exception as e:
return {
"generation": f"Error fetching contact information: {e}",
"documents": [],
"question": question
}
Generating Response
The node, generate, creates the final response by invoking a chain with a predefined prompt (LangChain’s PromptTemplate class) described below. The _ragprompt receives the state variables _ "question", "context", and "answer_styl_e" and guides the entire behavior of the response generation including instructions about response style, conversational tone, formatting guidelines, citation rules, hybrid context handling, and context-only focus.
rag_prompt = PromptTemplate(
template = r"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a highly accurate and trustworthy assistant specialized in answering questions related to business and entrepreneurship in Finland.
Your responses must strictly adhere to the provided context, answer style, using the follow these rules:
1. **Context-Only Answers with a given answer style**:
- Always base your answers on the provided context and answer style.
- If the context does not contain relevant information, respond with: 'No information found. Switch to internet search.'
- If the context contains some pieces of the required information, answer with that information and very briefly mention that the answer to other parts could not be found.
- If the context explicitly states 'I apologize, but I'm designed to answer questions specifically related to business and entrepreneurship in Finland,' output this context verbatim.
2. **Response style**:
- Address the query directly without unnecessary or speculative information.
- Do not draw from your knowledge base; strictly use the given context. However, take some liberty to provide more explanations and illustrations for better clarity and demonstration from your knowledge and experience only if answer style is "Moderate" or "Explanatory".
3. **Answer style**
- If answer style = "Concise", generate a concise answer.
- If answer style = "Moderate", use a moderate approach to generate answer where you can provide a little bit more explanation and elaborate the answer to improve clarity, integrating your own experience.
- If answer style = "Explanatory", elaborate the answer to provide more explanations with examples and illustrations to improve clarity in best possible way, integrating your own experience.
However, the explanations, examples and illustrations should be strictly based on the context.
3. **Conversational tone**
- Maintain a conversational and helping style which should tend to guide the user and provide him help, hints and offers to further help and information.
- Use simple language. Explain difficult concepts or terms wherever needed. Present the information in the best readable form.
4. **Formatting Guidelines**:
- Use bullet points for lists.
- Include line breaks between sections for clarity.
- Highlight important numbers, dates, and terms using **bold** formatting.
- Create tables wherever appropriate to present data clearly.
- If there are discrepancies in the context, clearly explain them.
5. **Citation Rules**:
- Citation information may be present in the context in the form of [document name, page number] or URLs. It is very important to cite references if you find them in the context.
- For responses based on vectorstore retrieval, cite the document name and page number with each piece of information in the format: [document_name, page xx].
- For the answer compiled from the context from multiple documents, use the format: document_name 1 [page xx, yy, zz, ...], document_name 2 [page xx, yy, zz, ...].
- For responses derived from websearch results and containing cited URLs, include all the URLs in hyperlink form returned by the websearch, each on a new line.
- Do not invent any citation or URL. Only use the citation or URL in the context.
6. **Hybrid Context Handling**:
- If the context contains two different sections with the names 'Smart guide results:' and 'Internet search results:', structure your response in corresponding sections with the following headings:
- **Smart guide results**: Include data from vectorstore retrieval and its citations in the format: [document_name, page xx].
- **Internet search results**: Include data from websearch and its citations (URLs). This does not mean only internet URLs, but all the data in 'Internet search results:' along with URLs.
- Do not combine the data in the two sections. Create two separate sections.
7. **Integrity and Trustworthiness**:
- Ensure every part of your response complies with these rules.
<|eot_id|><|start_header_id|>user<|end_header_id|>
Question: {question}
Context: {context}
Answer style: {answer_style}
Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
input_variables=["question", "context", "answer_style"]
)
The generate node first retrieves the state variables "question", "documents", and "_answerstyle" and formats the "documents" into a single string which serves as the context. Subsequently, it invokes the generation chain with _ragprompt and a response generation LLM _ to generate the final answer which is populated in "generatio_n" state variable. This state variable is used by _app.p_y to display the generated response in the Streamlit user interface.
With Groq’s free API, there is a possibility of hitting a model’s rate or context window limit. In that case, I extended generate node to dynamically switch the models in a circular fashion from the list of model names, and revert to the current model after generating the response.
# Helper function to format documents into a single string for context.
def format_documents(documents):
return "nn".join(doc.page_content for doc in documents)
# Graph node to generate the final response
def generate(state):
question = state["question"]
documents = state.get("documents", [])
answer_style = state.get("answer_style", "Concise")
if "llm" not in st.session_state:
st.session_state.llm = initialize_llm(st.session_state.selected_model, answer_style)
rag_chain = rag_prompt | st.session_state.llm | StrOutputParser()
if not documents:
print("No documents available for generation.")
return {"generation": "No relevant documents found.", "documents": documents, "question": question}
tried_models = set()
original_model = st.session_state.selected_model
current_model = original_model
while len(tried_models) < len(model_list):
try:
tried_models.add(current_model)
st.session_state.llm = initialize_llm(current_model, answer_style)
rag_chain = rag_prompt | st.session_state.llm | StrOutputParser()
context = format_documents(documents)
generation = rag_chain.invoke({"context": context, "question": question, "answer_style": answer_style})
print(f"Generating a {answer_style} length response.")
print(f"Response generated with {st.session_state.llm.model_name} model.")
print("Done.")
if current_model != original_model:
print(f"Reverting to original model: {original_model}")
st.session_state.llm = initialize_llm(original_model, answer_style)
return {"documents": documents, "question": question, "generation": generation}
except Exception as e:
error_message = str(e)
if "rate_limit_exceeded" in error_message or "Request too large" in error_message or "Please reduce the length of the messages or completion" in error_message:
print(f"Model's rate limit exceeded or request too large.")
current_model = model_list[(model_list.index(current_model) + 1) % len(model_list)]
print(f"Switching to model: {current_model}")
else:
return {
"generation": f"Error during generation: {error_message}",
"documents": documents,
"question": question,
}
return {
"generation": "Unable to process the request due to limitations across all models.",
"documents": documents,
"question": question,
}
Helper Functions
There are other helping functions in _agenticrag.py for initializing application, LLMs, embedding models, and session variables. The function _initializeapp is called from app.py during app initialization and __ is triggered every time a model or state variable is changed via the Streamlit app. It reinitializes components and saves the updated states. This function also keeps track of various session variables and prevents redundant initialization.
def initialize_app(model_name, selected_embedding_model, selected_routing_model, selected_grading_model, hybrid_search, internet_search, answer_style):
"""
Initialize embeddings, vectorstore, retriever, and LLM for the RAG workflow.
Reinitialize components only if the selection has changed.
"""
# Track current state to prevent redundant initialization
if "current_model_state" not in st.session_state:
st.session_state.current_model_state = {
"answering_model": None,
"embedding_model": None,
"routing_model": None,
"grading_model": None,
}
# Check if models or settings have changed
state_changed = (
st.session_state.current_model_state["answering_model"] != model_name or
st.session_state.current_model_state["embedding_model"] != selected_embedding_model or
st.session_state.current_model_state["routing_model"] != selected_routing_model or
st.session_state.current_model_state["grading_model"] != selected_grading_model
)
# Reinitialize components only if settings have changed
if state_changed:
st.session_state.embed_model = initialize_embedding_model(selected_embedding_model)
# Update vectorstore
persist_directory = persist_directory_openai if "text-" in selected_embedding_model else persist_directory_huggingface
st.session_state.vectorstore = load_or_create_vs(persist_directory)
st.session_state.retriever = st.session_state.vectorstore.as_retriever(search_kwargs={"k": 5})
st.session_state.llm = initialize_llm(model_name, answer_style)
st.session_state.router_llm = initialize_router_llm(selected_routing_model)
st.session_state.grader_llm = initialize_grading_llm(selected_grading_model)
st.session_state.doc_grader = initialize_grader_chain()
# Save updated state
st.session_state.current_model_state.update({
"answering_model": model_name,
"embedding_model": selected_embedding_model,
"routing_model": selected_routing_model,
"grading_model": selected_grading_model,
})
print(f"Using LLM: {model_name}, Router LLM: {selected_routing_model}, Grader LLM:{selected_grading_model}, embedding model: {selected_embedding_model}")
return workflow.compile()
The following helper functions initializes an answering LLM, embedding model, router LLM, and grading LLM. The list of model names, _modellist, is used to keep track of models during the dynamic switching of models by generate node.
model_list = [
"llama-3.1-8b-instant",
"llama-3.3-70b-versatile",
"llama3-70b-8192",
"llama3-8b-8192",
"mixtral-8x7b-32768",
"gemma2-9b-it",
"gpt-4o-mini",
"gpt-4o"
]
# Helper function to initialize the selected answering LLM
def initialize_llm(model_name, answer_style):
if "llm" not in st.session_state or st.session_state.llm.model_name != model_name:
if answer_style == "Concise":
temperature = 0.0
elif answer_style == "Moderate":
temperature = 0.2
elif answer_style == "Explanatory":
temperature = 0.4
if "gpt-" in model_name:
st.session_state.llm = ChatOpenAI(model=model_name, temperature=temperature)
else:
st.session_state.llm = ChatGroq(model=model_name, temperature=temperature)
return st.session_state.llm
# Helper function to initialize the selected embedding model
def initialize_embedding_model(selected_embedding_model):
# Check if the embed_model exists in session_state
if "embed_model" not in st.session_state:
st.session_state.embed_model = None
# Check if the current model matches the selected one
current_model_name = None
if st.session_state.embed_model:
if hasattr(st.session_state.embed_model, "model"):
current_model_name = st.session_state.embed_model.model
elif hasattr(st.session_state.embed_model, "model_name"):
current_model_name = st.session_state.embed_model.model_name
# Initialize a new model if it doesn't match the selected one
if current_model_name != selected_embedding_model:
if "text-" in selected_embedding_model:
st.session_state.embed_model = OpenAIEmbeddings(model=selected_embedding_model)
else:
st.session_state.embed_model = HuggingFaceEmbeddings(model_name=selected_embedding_model)
return st.session_state.embed_model
# Helper function to initialize the selected router LLM
def initialize_router_llm(selected_routing_model):
if "router_llm" not in st.session_state or st.session_state.router_llm.model_name != selected_routing_model:
if "gpt-" in selected_routing_model:
st.session_state.router_llm = ChatOpenAI(model=selected_routing_model, temperature=0.0)
else:
st.session_state.router_llm = ChatGroq(model=selected_routing_model, temperature=0.0)
return st.session_state.router_llm
# Helper function to initialize the selected grading LLM
def initialize_grading_llm(selected_grading_model):
if "grader_llm" not in st.session_state or st.session_state.grader_llm.model_name != selected_grading_model:
if "gpt-" in selected_grading_model:
st.session_state.grader_llm = ChatOpenAI(model=selected_grading_model, temperature=0.0)
else:
st.session_state.grader_llm = ChatGroq(model=selected_grading_model, temperature=0.0)
return st.session_state.grader_llm
Establishing the Workflow
Now the graph state, nodes, conditional entry points using _routequestion, and edges are defined to establish the flow between nodes. Finally, the workflow is compiled into an executable app for use within the Streamlit interface. The condition entry point in the workflow uses _routequestion function to select the first node in the workflow based on the query. The conditional edge (_workflow.add_conditionaledges) describes whether to transition to websearch or to generate node based on the relevancy of the chunks determined by _gradedocuments node.
workflow = StateGraph(GraphState)
# Add nodes
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("route_after_grading", route_after_grading)
workflow.add_node("websearch", web_search)
workflow.add_node("generate", generate)
workflow.add_node("get_contact_tool", get_contact_tool)
workflow.add_node("get_tax_info", get_tax_info)
workflow.add_node("get_registration_info", get_registration_info)
workflow.add_node("get_licensing_info", get_licensing_info)
workflow.add_node("hybrid_search", hybrid_search)
workflow.add_node("unrelated", handle_unrelated)
# Set conditional entry points
workflow.set_conditional_entry_point(
route_question,
{
"retrieve": "retrieve",
"websearch": "websearch",
"get_contact_tool": "get_contact_tool",
"get_tax_info": "get_tax_info",
"get_registration_info": "get_registration_info",
"get_licensing_info": "get_licensing_info",
"hybrid_search": "hybrid_search",
"unrelated": "unrelated"
},
)
# Add edges
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
"grade_documents",
route_after_grading,
{"websearch": "websearch", "generate": "generate"},
)
workflow.add_edge("websearch", "generate")
workflow.add_edge("get_contact_tool", "generate")
workflow.add_edge("get_tax_info", "generate")
workflow.add_edge("get_registration_info", "generate")
workflow.add_edge("get_licensing_info", "generate")
workflow.add_edge("hybrid_search", "generate")
workflow.add_edge("unrelated", "generate")
# Compile app
app = workflow.compile()
The Streamlit Interface
The Streamlit application in app.py provides an interactive interface to ask questions and display responses using dynamic settings for model selection, answer styles, and query-specific tools. The _initializeapp function, imported from _agenticrag.py, initializes all the session variables including all LLMs, embedding model, and other options selected from the left sidebar.
The print statements in _agentic_rag.p_y are captured by redirecting sys.stdout to an io.stringIO buffer. The content of this buffer is then displayed in the debug placeholder using the _textarea component in Streamlit.
import streamlit as st
from agentic_rag import initialize_app
import sys
import io
import os
import time
# Configure the Streamlit page layout
st.set_page_config(
page_title="Smart Business Guide",
layout="wide",
initial_sidebar_state="expanded",
page_icon = "🧠"
)
# Initialize session state for messages
if "messages" not in st.session_state:
st.session_state.messages = []
# Sidebar layout
with st.sidebar:
try:
st.image("LOGO_UPBEAT.jpg", width=150, use_container_width=True)
except Exception as e:
st.warning("Unable to load image. Continuing without it.")
st.title("🗣 ️ Smart Guide 1.0")
st.markdown("**▶️ Actions:**")
# Initialize session state for the model if it doesn't exist
if "selected_model" not in st.session_state:
st.session_state.selected_model = "gpt-4o"
if "selected_routing_model" not in st.session_state:
st.session_state.selected_routing_model = "gpt-4o"
if "selected_grading_model" not in st.session_state:
st.session_state.selected_grading_model = "gpt-4o"
if "selected_embedding_model" not in st.session_state:
st.session_state.selected_embedding_model = "text-embedding-3-large"
model_list = [
"llama-3.1-8b-instant",
"llama-3.3-70b-versatile",
"llama3-70b-8192",
"llama3-8b-8192",
"mixtral-8x7b-32768",
"gemma2-9b-it",
"gpt-4o-mini",
"gpt-4o"
]
embed_list = [
"text-embedding-3-large",
"sentence-transformers/all-MiniLM-L6-v2"
]
with st.expander("⚙️ Settings", expanded=False):
st.session_state.selected_model = st.selectbox(
"🤖 Select Answering LLM",
model_list,
key="model_selector",
index=model_list.index(st.session_state.selected_model)
)
st.session_state.selected_routing_model = st.selectbox(
"📡 Select Routing LLM",
model_list,
key="routing_model_selector",
index=model_list.index(st.session_state.selected_routing_model)
)
st.session_state.selected_grading_model = st.selectbox(
"🧮 Select Retrieval Grading LLM",
model_list,
key="grading_model_selector",
index=model_list.index(st.session_state.selected_grading_model)
)
st.session_state.selected_embedding_model = st.selectbox(
"🧠 Select Embedding Model",
embed_list,
key="embedding_model_selector",
index=embed_list.index(st.session_state.selected_embedding_model)
)
# Add the slider for answer style
answer_style = st.select_slider(
"💬 Answer Style",
options=["Concise", "Moderate", "Explanatory"],
value="Concise",
key="answer_style_slider",
disabled=False
)
search_option = st.radio(
"Search options",
["Smart guide + tools", "Internet search only", "Hybrid search (Guides + internet)"],
index=0
)
# Set the corresponding boolean values based on the selected option
hybrid_search = search_option == "Hybrid search (Guides + internet)"
internet_search = search_option == "Internet search only"
reset_button = st.button("🔄 Reset Conversation", key="reset_button")
# Initialize the app with the selected model
app = initialize_app(st.session_state.selected_model, st.session_state.selected_embedding_model, st.session_state.selected_routing_model, st.session_state.selected_grading_model, hybrid_search, internet_search, answer_style)
if reset_button:
st.session_state.messages = []
# Title
st.title("📘 Smart Guide for Entrepreneurship and Business Planning in Finland")
st.markdown(
"""
<div style="text-align: left; font-size: 18px; margin-top: 20px; line-height: 1.6;">
🤖 <b>Welcome to your Smart Business Guide!</b><br>
I am here to assist you with:<br>
<ul style="list-style-position: inside; text-align: left; display: inline-block;">
<li>AI agents based approach for finding answers from business and entrepreneurship guides in Finland</li>
<li>Providing up-to-date information through AI-based internet search</li>
<li>Automatically invoking AI-based internet search based on query understanding </li>
<li>Specialized tools for tax-related information, permits & licenses, business registration, residence permits, etc. :</li>
</ul>
<p style="margin-top: 10px;"><b>Start by typing your question in the chat below, and I'll provide tailored answers for your business needs!</b></p>
</div>
""",
unsafe_allow_html=True
)
# Display conversation history
for message in st.session_state.messages:
if message["role"] == "user":
with st.chat_message("user"):
st.markdown(f"**You:** {message['content']}")
elif message["role"] == "assistant":
with st.chat_message("assistant"):
st.markdown(f"**Assistant:** {message['content']}")
# Input box at the bottom for new messages
if user_input := st.chat_input("Type your question (Max. 150 char):"):
if len(user_input) > 150:
st.error("Your question exceeds 100 characters. Please shorten it and try again.")
else:
# Add user's message to session state and display it
st.session_state.messages.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(f"**You:** {user_input}")
# Capture print statements from agentic_rag.py
output_buffer = io.StringIO()
sys.stdout = output_buffer # Redirect stdout to the buffer
try:
with st.chat_message("assistant"):
response_placeholder = st.empty()
debug_placeholder = st.empty()
streamed_response = ""
# Show spinner while streaming the response
with st.spinner("Thinking..."):
#inputs = {"question": user_input}
inputs = {"question": user_input, "hybrid_search": hybrid_search, "internet_search":internet_search, "answer_style":answer_style}
for i, output in enumerate(app.stream(inputs)):
# Capture intermediate print messages
debug_logs = output_buffer.getvalue()
debug_placeholder.text_area(
"Debug Logs",
debug_logs,
height=100,
key=f"debug_logs_{i}"
)
if "generate" in output and "generation" in output["generate"]:
# Append new content to streamed response
streamed_response += output["generate"]["generation"]
# Update the placeholder with the streamed response so far
response_placeholder.markdown(f"**Assistant:** {streamed_response}")
# Store the final response in session state
st.session_state.messages.append({"role": "assistant", "content": streamed_response or "No response generated."})
except Exception as e:
# Handle errors and display in the conversation history
error_message = f"An error occurred: {e}"
st.session_state.messages.append({"role": "assistant", "content": error_message})
with st.chat_message("assistant"):
st.error(error_message)
finally:
# Restore stdout to its original state
sys.stdout = sys.__stdout__
Here is the snapshot of the Streamlit interface:

The following image shows the answer generated by llama-3.3–70b-versatile with ‘concise’ answer style selected. The query router (_routequestion) invokes the retriever (vector search) and the grader function finds all the retrieved chunks relevant. Hence, a decision to generate the answer through generate node is taken by _route_aftergrading node.

The following image shows the answer to the same question using ‘explanatory‘ answer style. As instructed in _ragprompt, the LLM elaborates the answer with more explanations.

The following image shows the router triggering _get_licenseinfo tool in response to the question.

The following image shows a web search invoked by _route_aftergrading node when no relevant chunk is found in vector search.

The following image shows the response generated with the hybrid search option selected in the Streamlit application. The _routequstion node finds the _internet_searchenabled state flag ‘True‘ and routes the question to _hybridsearch node.

Directions for Extension
This application can be enhanced in several directions, e.g.,
- Voice-enabled search and question-answer in multiple languages (e.g., Russian, Estonian, Arabic, etc.)
- Selecting different parts of a response and asking for more information or explanation.
- Adding memory of the last n number of messages.
- Including other modalities (such as images) in question answers.
- Adding more agents for brainstorming, writing, and idea generation.
That’s all folks! If you liked the article, please clap the article (multiple times 👏 ), write a comment, and follow me on Medium and LinkedIn.