Pinecone

Unstable API

0.8.0

@project-lakechain/pinecone-storage-connector

The Pinecone storage connector makes it easy to index vector embeddings produced by other middlewares in a Pinecone Pod or Serverless index. This connector uses the Pinecone TypeScript SDK to integrate embeddings associated with processed documents with your indexes, while respecting the Pinecone throttling limits.

ℹ️ This middleware interacts with a third-party API outside of your AWS account.

🌲 Indexing Documents

To use the Pinecone storage connector, you import it in your CDK stack, and connect it to a data source providing document embeddings.

💁 You need to specify a Pinecone API key to the connector, by specifying a reference to an AWS Secrets Manager secret containing the API key.

import { PineconeStorageConnector } from '@project-lakechain/pinecone-storage-connector';
import { CacheStorage } from '@project-lakechain/core';

class Stack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string) {
    const cache = new CacheStorage(this, 'Cache');

    // The Pinecone API key.
    const pineconeApiKey = secrets.Secret.fromSecretNameV2(
      this,
      'PineconeApiKey',
      process.env.PINECONE_API_KEY_SECRET_NAME as string
    );

    // Create the Pinecone storage connector.
    const connector = new PineconeStorageConnector.Builder()
      .withScope(this)
      .withIdentifier('PineconeStorageConnector')
      .withCacheStorage(cache)
      .withSource(source) // 👈 Specify a data source
      .withApiKey(pineconeApiKey)
      .withIndexName('pinecone-index')
      .build();
  }
}

Namespaces

To specify a specific namespace in which document embeddings will be stored, you can use the withNamespace API.

💁 By default, the namespace is set to an empty string.

const connector = new PineconeStorageConnector.Builder()
  .withScope(this)
  .withIdentifier('PineconeStorageConnector')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(pineconeApiKey)
  .withIndexName('pinecone-index')
  .withNamespace('my-namespace') // 👈 Specify a namespace
  .build();

Include Text

When the document being processed is a text document, you can choose to include the text of the document associated with the embeddings in the Pinecone index. To do so, you can use the withIncludeText API. If the document is not a text, this option is ignored.

💁 By default, the text is not included in the index.

const connector = new PineconeStorageConnector.Builder()
  .withScope(this)
  .withIdentifier('PineconeStorageConnector')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(pineconeApiKey)
  .withIndexName('pinecone-index')
  .withIncludeText(true) // 👈 Include text
  .build();

Controller Host

To specify a custom controller host, you can use the withControllerHost API.

const connector = new PineconeStorageConnector.Builder()
  .withScope(this)
  .withIdentifier('PineconeStorageConnector')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(pineconeApiKey)
  .withIndexName('pinecone-index')
  .withControllerHostUrl('https://api.pinecone.io')
  .build();

🏗️ Architecture

This middleware is based on a Lambda ARM64 compute to perform the indexing of document embeddings from source middlewares into the destination Pinecone index. It also leverages AWS Secrets Manager to retrieve the Pinecone API key at runtime.

Pinecone Storage Connector Architecture

🏷️ Properties

Supported Inputs

Mime Type	Description
`/`	This middleware supports any type of documents. Note that if no embeddings are specified in the document metadata, the document is filtered out.

Supported Outputs

This middleware does not produce any output.

Supported Compute Types

Type	Description
`CPU`	This middleware only supports CPU compute.

📖 Examples

Bedrock + Pinecone Pipeline - An example showcasing an embedding pipeline using Amazon Bedrock and Pinecone.