S3 Connector
The S3 storage connector makes it possible to capture the result of one or multiple middlewares in a pipeline and store their results in a user-defined S3 bucket destination. This connector supports storing both the CloudEvents emitted by middlewares, but also optionally copy the output document itself to the destination bucket.
ποΈ Storing Documents
To use the S3 storage connector, you import it in your CDK stack, and connect it to a data source providing documents.
Copy Documents
By default, this connector will copy the CloudEvent emitted by middlewares, and also copy the document itself to the destination bucket. This behavior can be disabled by using the .withCopyDocuments
API.
π This implementation uses a streaming approach for copying bytes to optimize the copy of large documents, such that it doesnβt need to load the entire document in memory. Note however that copies in general can be time and cost expensive.
Storage Class
You can specify a storage class for the documents and their metadata being written in the destination bucket to tailor the cost of the storage using the .withStorageClass
API.
π By default, the S3 Standard class will be used to store the documents and their metadata.
ποΈ Architecture
Thi middleware is based on a Lambda ARM64 compute to perform the copying of document data from source middlewares into the destination bucket.
π·οΈ Properties
Supported Inputs
Mime Type | Description |
---|---|
*/* | This middleware supports any type of documents. |
Supported Outputs
This middleware does not emit any output.
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
π Examples
- Article Curation Pipeline - Builds a pipeline converting HTML articles into plain text.
- Face Detection Pipeline - An example showcasing how to build face detection pipelines using Project Lakechain.