Storage Model
Topics
Section titled “Topics”Overview
Section titled “Overview”The lexical-graph uses two separate stores: a GraphStore and a VectorStore. A VectorStore acts as a container for a collection of VectorIndex. When constructing or querying a graph, you must provide instances of both a graph store and vector store.
The toolkit provides graph store implementations for both Amazon Neptune Analytics and Amazon Neptune Database (engine version 1.4.1.0 or later), and now FalkorDB, along with vector store implementations for Neptune Analytics, Amazon OpenSearch Serverless and Postgres with the pgvector extension. The lexical-graph provides several convenient factory methods for creating instances of these stores.
This early release of the toolkit provides support for Amazon Neptune and Amazon OpenSearch Serverless, but we welcome alternative store implementations. The store APIs and the ways in which the stores are used have been designed to anticipate alternative implementations. However, the proof is in the development: if you experience issues developing an alternative store, let us know.
Graph stores and vector stores provide connectivity to an existing storage instance, which you will need to have provisioned beforehand.
Graph store
Section titled “Graph store”Graph stores must support the openCypher property graph query language. Graph construction queries typically use an UNWIND ... MERGE idiom to create or update the graph for a batch of inputs. The Neptune graph store implementations override the GraphStore.node_id() method to ensure that node ids in the code (e.g. chunkId) are mapped to Neptune’s ~id reserved property. Alternative graph store implementations can leave the base implementation of node_id() as-is. This will result in node ids being mapped to a property of the same name (i.e. a reference to chunkId in the code will be mapped to a chunkId property of a node).
You use the GraphStoreFactory.for_graph_store() static factory method to create a graph store.
The lexical-graph supports the following graph databases:
Logging graph queries
Section titled “Logging graph queries”By default, all graph queries in logs are redacted. To configure the toolkit to log queries and their results, use NonRedactedGraphQueryLogFormatting when creating a graph store:
import osfrom graphrag_toolkit.lexical_graph import set_logging_configfrom graphrag_toolkit.lexical_graph.storage import GraphStoreFactoryfrom graphrag_toolkit.lexical_graph.storage.graph import NonRedactedGraphQueryLogFormatting
set_logging_config('DEBUG', ['graphrag_toolkit.lexical_graph.storage.graph'])
graph_store = GraphStoreFactory.for_graph_store( os.environ['GRAPH_STORE'], log_formatting=NonRedactedGraphQueryLogFormatting())Vector store
Section titled “Vector store”A vector store is a collection of vector indexes. The lexical-graph uses up to two vector indexes: a chunk index and a statement index. The chunk index is typically much smaller than the statement index. If you want to use semantic-guided search, you will need to enable the statement index. If you want to use traversal-based-search, you will need to enable the chunk index. The VectorStoreFactory described below enables both indexes by default.
You use the VectorStoreFactory.for_vector_store() static factory method to create a vector store.
The lexical-graph supports the following vector-stores:
- Amazon OpenSearch Serverless
- Amazon Neptune Analytics
- Postgres with the pgvector extension
- Amazon S3 Vectors
By default, the VectorStoreFactory will enable both the statement index and the chunk index. However, we recommend using traversal-based search, which requres only the chunk index. Use the index_names argument to enable just the chunk index:
vector_store = VectorStoreFactory.for_vector_store(opensearch_connection_info, index_names=['chunk'])