OpenSearch Vectors

Unstable API

0.8.0

@project-lakechain/opensearch-vector-storage-connector

The OpenSearch vector storage connector enables developers to automatically index document events and their associated vector embeddings into an Amazon OpenSearch Domains domain or an Amazon Serverless Collection.

🗄️ Indexing Documents

To use the OpenSearch vectors storage connector, you import it in your CDK stack, and connect it to a data source providing document embeddings.

💁 You specify an index definition describing the index that the connector will create in your OpenSearch database to store document events and embeddings.

import { OpenSearchVectorStorageConnector } from '@project-lakechain/opensearch-vector-storage-connector';
import { CacheStorage } from '@project-lakechain/core';

class Stack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string) {
    const cache = new CacheStorage(this, 'Cache');

    // Sample VPC.
    const vpc = new ec2.Vpc(this, 'Vpc');

    // The OpenSearch domain or collection.
    const opensearch = // ...

    // Create the OpenSearch storage connector.
    const connector = new OpenSearchVectorStorageConnector.Builder()
      .withScope(this)
      .withIdentifier('OpenSearchVectorStorageConnector')
      .withCacheStorage(cache)
      .withEndpoint(opensearch)
      .withSource(embeddingProcessor)
      .withVpc(vpc)
      .withIndex(new OpenSearchVectorIndexDefinition.Builder()
        .withIndexName('vector-index')
        .withKnnMethod('hnsw')
        .withKnnEngine('nmslib')
        .withSpaceType('l2')
        .withDimensions(1536)
        .withParameters({ 'ef_construction': 512, 'm': 16 })
        .build()
      )
      .build();
  }
}

Index Definition

The index definition allows you to configure the index attributes that will be used by the connector. Below is a description of the attributes that can be configured.

Attribute	Description
`indexName`	The name of the index to create.
`knnMethod`	The KNN method (only `hnsw` is currently supported).
`knnEngine`	The KNN engine (`faiss` or `nmslib`).
`spaceType`	The space type (`l2`, `l1`, `innerproduct`, `cosinesimil`, `linf`).
`dimensions`	The number of dimensions of the vectors.
`parameters`	The parameters for the index.

🌐 Endpoints

This middleware supports instances of IDomain, ICollection or an Amazon Serverless CfnCollection that you can pass to the withEndpoint method.

🏗️ Architecture

THis middleware uses a Lambda function to index documents in batches into an OpenSearch domain or OpenSearch Serverless collection.

💁 By default, this connector uses a batch of 10 documents and batches documents for a period of 20 seconds.

OpenSearch Vector Storage Connector Architecture

🏷️ Properties

Supported Inputs

Mime Type	Description
`/`	This middleware supports any type of documents. Note that if no embeddings are specified in the document metadata, the document is filtered out.

Supported Outputs

This middleware does not produce any output.

Supported Compute Types

Type	Description
`CPU`	This middleware only supports CPU compute.

📖 Examples

Bedrock OpenSearch Pipeline - An example showcasing an embedding pipeline using Amazon Bedrock and OpenSearch.
Cohere OpenSearch Pipeline - An example showcasing an embedding pipeline using Cohere models on Bedrock and OpenSearch.