Pinecone
Unstable API
0.8.0
@project-lakechain/pinecone-storage-connector
The Pinecone storage connector makes it easy to index vector embeddings produced by other middlewares in a Pinecone Pod or Serverless index. This connector uses the Pinecone TypeScript SDK to integrate embeddings associated with processed documents with your indexes, while respecting the Pinecone throttling limits.
âšī¸ This middleware interacts with a third-party API outside of your AWS account.
đ˛ Indexing Documents
To use the Pinecone storage connector, you import it in your CDK stack, and connect it to a data source providing document embeddings.
đ You need to specify a Pinecone API key to the connector, by specifying a reference to an AWS Secrets Manager secret containing the API key.
import { PineconeStorageConnector } from '@project-lakechain/pinecone-storage-connector';import { CacheStorage } from '@project-lakechain/core';
class Stack extends cdk.Stack { constructor(scope: cdk.Construct, id: string) { const cache = new CacheStorage(this, 'Cache');
// The Pinecone API key. const pineconeApiKey = secrets.Secret.fromSecretNameV2( this, 'PineconeApiKey', process.env.PINECONE_API_KEY_SECRET_NAME as string );
// Create the Pinecone storage connector. const connector = new PineconeStorageConnector.Builder() .withScope(this) .withIdentifier('PineconeStorageConnector') .withCacheStorage(cache) .withSource(source) // đ Specify a data source .withApiKey(pineconeApiKey) .withIndexName('pinecone-index') .build(); }}
Namespaces
To specify a specific namespace in which document embeddings will be stored, you can use the withNamespace
API.
đ By default, the namespace is set to an empty string.
const connector = new PineconeStorageConnector.Builder() .withScope(this) .withIdentifier('PineconeStorageConnector') .withCacheStorage(cache) .withSource(source) .withApiKey(pineconeApiKey) .withIndexName('pinecone-index') .withNamespace('my-namespace') // đ Specify a namespace .build();
Include Text
When the document being processed is a text document, you can choose to include the text of the document associated with the embeddings in the Pinecone index. To do so, you can use the withIncludeText
API. If the document is not a text, this option is ignored.
đ By default, the text is not included in the index.
const connector = new PineconeStorageConnector.Builder() .withScope(this) .withIdentifier('PineconeStorageConnector') .withCacheStorage(cache) .withSource(source) .withApiKey(pineconeApiKey) .withIndexName('pinecone-index') .withIncludeText(true) // đ Include text .build();
Controller Host
To specify a custom controller host, you can use the withControllerHost
API.
const connector = new PineconeStorageConnector.Builder() .withScope(this) .withIdentifier('PineconeStorageConnector') .withCacheStorage(cache) .withSource(source) .withApiKey(pineconeApiKey) .withIndexName('pinecone-index') .withControllerHostUrl('https://api.pinecone.io') .build();
đī¸ Architecture
This middleware is based on a Lambda ARM64 compute to perform the indexing of document embeddings from source middlewares into the destination Pinecone index. It also leverages AWS Secrets Manager to retrieve the Pinecone API key at runtime.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
*/* | This middleware supports any type of documents. Note that if no embeddings are specified in the document metadata, the document is filtered out. |
Supported Outputs
This middleware does not produce any output.
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- Bedrock + Pinecone Pipeline - An example showcasing an embedding pipeline using Amazon Bedrock and Pinecone.