Skip to content

Multi-Strategy Retrieval

BYOKG-RAG implements a multi-strategy retrieval approach that combines different methods for extracting relevant information from knowledge graphs. The system uses iterative processing with LLM-guided entity extraction, path discovery, and query generation to provide comprehensive question answering.

The multi-strategy retrieval system in BYOKG-RAG operates through the ByoKGQueryEngine using the KGLinker component. Unlike traditional single-pass retrieval systems, it employs an iterative approach that builds context progressively through multiple retrieval strategies:

  1. Entity Extraction and Linking - Identifies and links entities from natural language to graph nodes
  2. Agentic Triplet Retrieval - Uses LLM-guided exploration to find relevant triplets
  3. Path-based Retrieval - Follows metapaths between entities to discover relationships
  4. Query-based Retrieval - Executes structured graph queries (Cypher, SPARQL)

The system begins by extracting entities from the natural language question using the KGLinker:

# Extract entities using LLM
artifacts = kg_linker.parse_response(response)
if "entity-extraction" in artifacts:
linked_entities = entity_linker.link(artifacts["entity-extraction"], return_dict=False)

Process:

  1. LLM extracts entities from the question
  2. EntityLinker matches extracted entities to graph nodes using fuzzy string matching
  3. Linked entities serve as starting points for graph traversal

Entity linking strategies:

  • Fuzzy string matching - Default approach using string similarity
  • Semantic similarity - Optional direct query linking using embeddings
  • Exact matching - Direct node ID or label matching

Uses the AgenticRetriever to perform LLM-guided exploration of the knowledge graph:

if source_entities and triplet_retriever:
triplet_context = triplet_retriever.retrieve(query, source_entities)
self._add_to_context(retrieved_context, triplet_context)

Characteristics:

  • Iterative exploration - Makes decisions at each step based on current context
  • Relevance-guided - Uses LLM to select most relevant relations and entities
  • Context-aware - Builds upon previously retrieved information
  • Early termination - Stops when sufficient information is found

Extracts and follows metapaths between entities to discover multi-hop relationships:

if "path-extraction" in artifacts and explored_entities and path_retriever:
metapaths = [[component.strip() for component in path.split("->")]
for path in artifacts["path-extraction"]]
path_context = path_retriever.retrieve(list(explored_entities), metapaths, linked_answers)

Features:

  • Metapath extraction - LLM identifies relevant path patterns
  • Structured traversal - Follows predefined relationship sequences
  • Multi-hop reasoning - Connects entities through intermediate nodes
  • Path verbalization - Converts graph paths to natural language

Executes structured graph queries generated by the LLM:

for query_type in ["opencypher", "opencypher-neptune-rdf", "opencypher-neptune"]:
if query_type in artifacts and graph_query_executor:
graph_query = " ".join(artifacts[query_type])
context = graph_query_executor.retrieve(graph_query, return_answers=False)

Supported query types:

  • OpenCypher - Standard Cypher queries for property graphs
  • OpenCypher Neptune RDF - Neptune-specific RDF queries
  • OpenCypher Neptune - Neptune-optimized Cypher queries

The multi-strategy retrieval operates through iterative refinement:

for iteration in range(iterations):
# Use different prompts for different iterations
if iteration == 0:
task_prompts = self.kg_linker_prompts
else:
task_prompts = self.kg_linker_prompts_iterative
# Generate response with accumulated context
response = self.kg_linker.generate_response(
question=query,
schema=self.schema,
graph_context="\n".join(retrieved_context),
task_prompts=task_prompts
)

Each iteration builds upon the previous context:

  1. First iteration - Uses standard task prompts with no prior context
  2. Subsequent iterations - Uses iterative prompts with accumulated context
  3. Context accumulation - New information is added to existing context
  4. Deduplication - Prevents redundant information from being added

The system uses different prompt strategies:

  • Initial prompts (kg_linker_prompts) - Designed for fresh question analysis
  • Iterative prompts (kg_linker_prompts_iterative) - Optimized for context-aware refinement

The system maintains context through the _add_to_context method:

def _add_to_context(self, context_list: List[str], new_items: List[str]) -> None:
"""Add new items to context list while maintaining order and avoiding duplicates."""
seen = set(context_list)
for item in new_items:
if item not in seen:
context_list.append(item)
seen.add(item)

Features:

  • Deduplication - Prevents redundant information
  • Order preservation - Maintains chronological order of discovery
  • Incremental building - Adds new information progressively

The system tracks explored entities across iterations:

explored_entities: Set[str] = set()
# Update with newly linked entities
explored_entities.update(linked_entities)

This prevents redundant exploration and enables progressive discovery.

The system monitors for completion signals in LLM responses:

task_completion = parse_response(response, r"<task-completion>(.*?)</task-completion>")
if "FINISH" in " ".join(task_completion):
break

The retrieval process can terminate early when:

  • LLM indicates task completion with “FINISH” signal
  • Sufficient information has been gathered
  • Maximum iterations reached
  • Explicit completion - LLM explicitly signals completion
  • Implicit completion - No new entities or information found
  • Iteration limit - Maximum iteration count reached
# Configure iteration counts
context = query_engine.query(
question="Your question here",
iterations=3 # Number of multi-strategy iterations
)

Enable semantic similarity-based entity linking:

query_engine = ByoKGQueryEngine(
graph_store=graph_store,
llm_generator=llm_generator,
direct_query_linking=True # Enable semantic entity linking
)
# Custom triplet retriever configuration
from graph_retrievers import AgenticRetriever
custom_triplet_retriever = AgenticRetriever(
llm_generator=llm,
graph_traversal=traversal,
graph_verbalizer=verbalizer,
max_num_iterations=4,
max_num_relations=10
)
query_engine = ByoKGQueryEngine(
graph_store=graph_store,
llm_generator=llm_generator,
triplet_retriever=custom_triplet_retriever
)