Skip to content

OpenSearch Vectors

Unstable API

0.8.0

@project-lakechain/opensearch-vector-storage-connector

TypeScript Icon

The OpenSearch vector storage connector enables developers to automatically index document events and their associated vector embeddings into an Amazon OpenSearch Domains domain or an Amazon Serverless Collection.


🗄️ Indexing Documents

To use the OpenSearch vectors storage connector, you import it in your CDK stack, and connect it to a data source providing document embeddings.

💁 You specify an index definition describing the index that the connector will create in your OpenSearch database to store document events and embeddings.

import { OpenSearchVectorStorageConnector } from '@project-lakechain/opensearch-vector-storage-connector';
import { CacheStorage } from '@project-lakechain/core';
class Stack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string) {
const cache = new CacheStorage(this, 'Cache');
// Sample VPC.
const vpc = new ec2.Vpc(this, 'Vpc');
// The OpenSearch domain or collection.
const opensearch = // ...
// Create the OpenSearch storage connector.
const connector = new OpenSearchVectorStorageConnector.Builder()
.withScope(this)
.withIdentifier('OpenSearchVectorStorageConnector')
.withCacheStorage(cache)
.withEndpoint(opensearch)
.withSource(embeddingProcessor)
.withVpc(vpc)
.withIndex(new OpenSearchVectorIndexDefinition.Builder()
.withIndexName('vector-index')
.withKnnMethod('hnsw')
.withKnnEngine('nmslib')
.withSpaceType('l2')
.withDimensions(1536)
.withParameters({ 'ef_construction': 512, 'm': 16 })
.build()
)
.build();
}
}


Index Definition

The index definition allows you to configure the index attributes that will be used by the connector. Below is a description of the attributes that can be configured.

AttributeDescription
indexNameThe name of the index to create.
knnMethodThe KNN method (only hnsw is currently supported).
knnEngineThe KNN engine (faiss or nmslib).
spaceTypeThe space type (l2, l1, innerproduct, cosinesimil, linf).
dimensionsThe number of dimensions of the vectors.
parametersThe parameters for the index.


🌐 Endpoints

This middleware supports instances of IDomain, ICollection or an Amazon Serverless CfnCollection that you can pass to the withEndpoint method.



🏗️ Architecture

THis middleware uses a Lambda function to index documents in batches into an OpenSearch domain or OpenSearch Serverless collection.

💁 By default, this connector uses a batch of 10 documents and batches documents for a period of 20 seconds.

OpenSearch Vector Storage Connector Architecture



🏷️ Properties


Supported Inputs
Mime TypeDescription
*/*This middleware supports any type of documents. Note that if no embeddings are specified in the document metadata, the document is filtered out.
Supported Outputs

This middleware does not produce any output.

Supported Compute Types
TypeDescription
CPUThis middleware only supports CPU compute.


📖 Examples