Storage Model

Topics

Overview
Graph store
- Logging graph queries
Vector store

Overview

The lexical-graph uses two separate stores: a GraphStore and a VectorStore. A VectorStore acts as a container for a collection of VectorIndex. When constructing or querying a graph, you must provide instances of both a graph store and vector store.

The toolkit provides graph store implementations for both Amazon Neptune Analytics and Amazon Neptune Database (engine version 1.4.1.0 or later), and now FalkorDB, along with vector store implementations for Neptune Analytics, Amazon OpenSearch Serverless and Postgres with the pgvector extension. The lexical-graph provides several convenient factory methods for creating instances of these stores.

This early release of the toolkit provides support for Amazon Neptune and Amazon OpenSearch Serverless, but we welcome alternative store implementations. The store APIs and the ways in which the stores are used have been designed to anticipate alternative implementations. However, the proof is in the development: if you experience issues developing an alternative store, let us know.

Graph stores and vector stores provide connectivity to an existing storage instance, which you will need to have provisioned beforehand.

Graph store

Graph stores must support the openCypher property graph query language. Graph construction queries typically use an UNWIND ... MERGE idiom to create or update the graph for a batch of inputs. The Neptune graph store implementations override the GraphStore.node_id() method to ensure that node ids in the code (e.g. chunkId) are mapped to Neptune’s ~id reserved property. Alternative graph store implementations can leave the base implementation of node_id() as-is. This will result in node ids being mapped to a property of the same name (i.e. a reference to chunkId in the code will be mapped to a chunkId property of a node).

You use the GraphStoreFactory.for_graph_store() static factory method to create a graph store.

The lexical-graph supports the following graph databases:

Logging graph queries

By default, all graph queries in logs are redacted. To configure the toolkit to log queries and their results, use NonRedactedGraphQueryLogFormatting when creating a graph store:

import os
from graphrag_toolkit.lexical_graph import set_logging_config
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage.graph import NonRedactedGraphQueryLogFormatting

set_logging_config('DEBUG', ['graphrag_toolkit.lexical_graph.storage.graph'])

graph_store = GraphStoreFactory.for_graph_store(
  os.environ['GRAPH_STORE'],
  log_formatting=NonRedactedGraphQueryLogFormatting()
)

Vector store

A vector store is a collection of vector indexes. The lexical-graph uses up to two vector indexes: a chunk index and a statement index. The chunk index is typically much smaller than the statement index. If you want to use the deprecated semantic-guided search, you will need the statement index. If you want to use traversal-based-search, you will need the chunk index. The VectorStoreFactory described below enables both the chunk and statement indexes by default, but the statement index is associated with the deprecated semantic-guided retriever and may be removed from the default in a future release.

You use the VectorStoreFactory.for_vector_store() static factory method to create a vector store.

The lexical-graph supports the following vector-stores:

By default, the VectorStoreFactory enables both the chunk and statement indexes:

# Default: creates both chunk and statement indexes
vector_store = VectorStoreFactory.for_vector_store(opensearch_connection_info)

If you only need the recommended traversal-based search, you can opt out of the statement index to reduce storage costs:

# Opt out of the statement index (sufficient for traversal-based search)
vector_store = VectorStoreFactory.for_vector_store(
    opensearch_connection_info,
    index_names=['chunk']
)