Skip to content

Querying

The primary unit of context presented to the LLM is the statement — a standalone assertion or proposition extracted from a source chunk. Statements are grouped by topic and source, and that grouping is what the query engine presents to the LLM.

The lexical-graph uses a traversal-based search strategy that combines similarity search with graph traversal. A semantic-guided search approach also exists but is likely to be retired in a future release.

Querying supports metadata filtering and multi-tenancy.

query_traversal.py
from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
graph_store,
vector_store,
versioning=False,
)
response = query_engine.query("How does Neptune Analytics differ from Neptune DB?")
print(response)

Use for_traversal_based_search() for most workloads. Use for_semantic_guided_search() only if you specifically need the semantic-guided strategy.

Both factory methods accept graph_store, vector_store, tenant_id, post_processors, filter_config, and **kwargs. The versioning parameter name differs between the two (lexical_graph_query_engine.py:67):

Factory methodversioning parameter
for_traversal_based_searchversioning
for_semantic_guided_searchenable_versioning

You can also construct LexicalGraphQueryEngine directly, passing system_prompt, user_prompt, or a prompt_provider kwarg. See Using Custom Prompt Providers.

The context_format kwarg controls how retrieved statements are serialised before being injected into the LLM prompt. Supported values (lexical_graph_query_engine.py:408):

ValueDescriptionDefault for
'json'JSON array of topic/statement objects__init__ direct construction
'yaml'YAML representation of the same structure
'xml'XML representation of the same structure
'text'Plain text, one topic heading per groupfor_traversal_based_search
'bedrock_xml'Pre-formatted XML produced by a BedrockContextFormat post-processorfor_semantic_guided_search (hardcoded)

for_semantic_guided_search always uses 'bedrock_xml' and ignores any context_format kwarg you pass. for_traversal_based_search defaults to 'text' but accepts any of the values above.

The verbose kwarg (default True) controls answer length. When True, the LLM is instructed to answer fully; when False, concisely. This only affects the non-streaming code path (lexical_graph_query_engine.py:356).

query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
graph_store,
vector_store,
verbose=False
)

It does not implement async querying — calling await query_engine.aquery(...) will raise a NotImplementedError. Use query_engine.query(...) instead (lexical_graph_query_engine.py:563).

LexicalGraphIndex exposes three methods for inspecting and managing what has been indexed (lexical_graph_index.py:596):

Returns a dict with node counts and two graph connectivity metrics:

stats = graph_index.get_stats()
# {
# 'source': 12, 'chunk': 180, 'topic': 950,
# 'statement': 4200, 'fact': 3100, 'entity': 820,
# 'localConnectivity': 1.23456,
# 'globalConnectivity': 0.98765,
# ...
# }

Queries the graph for source document metadata. Accepts a source_id (str), source_ids (list), filter (FilterConfig, dict, or list of dicts), an optional versioning_config, and an optional order_by field name or list.

sources = graph_index.get_sources(filter={'url': 'https://example.com/page'})

Same filter API as get_sources. Removes matching sources from both the graph store and the vector store and returns the list of deleted source IDs.

deleted = graph_index.delete_sources(source_id='chunk::abc123')

See also: