LanceDB
Unstable API
0.8.0
@project-lakechain/lancedb-storage-connector
The LanceDB connector makes it possible for developers to leverage the embedded nature of LanceDB databases to store document descriptions and their associated vector embeddings. This can be a particularly good choice for applications that donβt require ultra-low latency for indexing and retrieval, and are not I/O sensitive.
π By leveraging LanceDB as a vector store, developers can store 10βs of thousands of vectors at a very low cost, benefiting from the serverless nature of LanceDB.
πΎ Indexing Documents
To use the LanceDB storage connector, you import it in your CDK stack, and connect it to a data source providing document embeddings. You also define a storage provider such as S3 or EFS that will serve as the storage backend for the LanceDB database.
βΉοΈ The below example showcases how to create a LanceDB connector leveraging the S3 storage provider.
ποΈ Storage Providers
The LanceDB storage connector supports 2 different storage providers allowing you to balance the needs between cost, performance, durability and latency.
S3 Storage
The S3 storage provider uses an S3 bucket to store the LanceDB database using a standard storage class.
π The provider does not create the S3 bucket, but uses a customer provided bucket, as well as an optional path prefix to store the database.
EFS Storage
The EFS storage provider leverages AWS EFS to store the LanceDB database, providing lower latency and higher IOPS compared to S3.
π The provider does not create the EFS file system, but uses a customer provided file system placed in a VPC, as well as an optional path prefix to store the database.
Include Text
When the document being processed is a text document, you can choose to include the text of the document associated with the embeddings in the LanceDB table. This allows you to retrieve the text associated with the embeddings when executing a similarity search without having to retrieve the original text from a separate database.
To do so, you can use the withIncludeText
API. If the document is not a text, this option is ignored.
π By default, the text is not included in the index.
ποΈ Architecture
The architecture implemented by the LanceDB storage connector is based on a Lambda ARM64 compute to index document embeddings provided by source middlewares into the LanceDB database. The connector uses an AWS Lambda Layer to include the LanceDB library within the Lambda environment.
π The architecture depends on the selected storage provider. Below is a description of the architecture for each storage provider.
S3 Storage Provider
The S3 storage provider uses a user provided S3 bucket to store the LanceDB database.
EFS Storage Provider
The EFS storage provider uses a user provided EFS file system to store the LanceDB database.
π·οΈ Properties
Supported Inputs
Mime Type | Description |
---|---|
*/* | This middleware supports any type of documents. Note that if no embeddings are specified in the document metadata, the document is filtered out. |
Supported Outputs
This middleware does not produce any output.
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
π Examples
- Bedrock + LanceDB - An example showcasing an embedding pipeline using Amazon Bedrock and LanceDB.