Querying
The primary unit of context presented to the LLM is the statement — a standalone assertion or proposition extracted from a source chunk. Statements are grouped by topic and source, and that grouping is what the query engine presents to the LLM.
The lexical-graph uses a traversal-based search strategy that combines similarity search with graph traversal. A semantic-guided search approach also exists but is likely to be retired in a future release.
Querying supports metadata filtering and multi-tenancy.
Factory methods
Section titled “Factory methods”from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
query_engine = LexicalGraphQueryEngine.for_traversal_based_search( graph_store, vector_store, versioning=False,)
response = query_engine.query("How does Neptune Analytics differ from Neptune DB?")print(response)from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
query_engine = LexicalGraphQueryEngine.for_semantic_guided_search( graph_store, vector_store, enable_versioning=False,)
response = query_engine.query("How does Neptune Analytics differ from Neptune DB?")print(response)Use for_traversal_based_search() for most workloads. Use for_semantic_guided_search() only if you specifically need the semantic-guided strategy.
Both factory methods accept graph_store, vector_store, tenant_id, post_processors, filter_config, and **kwargs. The versioning parameter name differs between the two (lexical_graph_query_engine.py:67):
| Factory method | versioning parameter |
|---|---|
for_traversal_based_search | versioning |
for_semantic_guided_search | enable_versioning |
You can also construct LexicalGraphQueryEngine directly, passing system_prompt, user_prompt, or a prompt_provider kwarg. See Using Custom Prompt Providers.
Context format
Section titled “Context format”The context_format kwarg controls how retrieved statements are serialised before being injected into the LLM prompt. Supported values (lexical_graph_query_engine.py:408):
| Value | Description | Default for |
|---|---|---|
'json' | JSON array of topic/statement objects | __init__ direct construction |
'yaml' | YAML representation of the same structure | — |
'xml' | XML representation of the same structure | — |
'text' | Plain text, one topic heading per group | for_traversal_based_search |
'bedrock_xml' | Pre-formatted XML produced by a BedrockContextFormat post-processor | for_semantic_guided_search (hardcoded) |
for_semantic_guided_search always uses 'bedrock_xml' and ignores any context_format kwarg you pass. for_traversal_based_search defaults to 'text' but accepts any of the values above.
Verbose mode
Section titled “Verbose mode”The verbose kwarg (default True) controls answer length. When True, the LLM is instructed to answer fully; when False, concisely. This only affects the non-streaming code path (lexical_graph_query_engine.py:356).
query_engine = LexicalGraphQueryEngine.for_traversal_based_search( graph_store, vector_store, verbose=False)Async querying
Section titled “Async querying”It does not implement async querying — calling await query_engine.aquery(...) will raise a NotImplementedError. Use query_engine.query(...) instead (lexical_graph_query_engine.py:563).
Managing indexed sources
Section titled “Managing indexed sources”LexicalGraphIndex exposes three methods for inspecting and managing what has been indexed (lexical_graph_index.py:596):
get_stats()
Section titled “get_stats()”Returns a dict with node counts and two graph connectivity metrics:
stats = graph_index.get_stats()# {# 'source': 12, 'chunk': 180, 'topic': 950,# 'statement': 4200, 'fact': 3100, 'entity': 820,# 'localConnectivity': 1.23456,# 'globalConnectivity': 0.98765,# ...# }get_sources(...)
Section titled “get_sources(...)”Queries the graph for source document metadata. Accepts a source_id (str), source_ids (list), filter (FilterConfig, dict, or list of dicts), an optional versioning_config, and an optional order_by field name or list.
sources = graph_index.get_sources(filter={'url': 'https://example.com/page'})delete_sources(...)
Section titled “delete_sources(...)”Same filter API as get_sources. Removes matching sources from both the graph store and the vector store and returns the list of deleted source IDs.
deleted = graph_index.delete_sources(source_id='chunk::abc123')See also: