OpenSearch
Unstable API
0.8.0
@project-lakechain/opensearch-storage-connector
The OpenSearch storage connector enables developers to index CloudEvents in an Amazon OpenSearch domain at scale within their pipelines. This connector uses AWS Firehose to buffer events and store them in batch to OpenSearch using a serverless architecture.
đī¸ Indexing Documents
To use the OpenSearch storage connector, you import it in your CDK stack, and connect it to a data source providing documents.
Buffering Hints
This connector creates an AWS Firehose delivery stream to buffer events before sending them to OpenSearch. You can customize the way that the connector will buffer events by specifying optional buffering hints.
âšī¸ The buffering hints are set to
10MB
or60s
by default.
Index Rotation
You can also configure the rotation period of the OpenSearch index that is applied by AWS Firehose.
âšī¸ The index rotation is set to
NoRotation
by default.
Possible values for the index rotation period are NoRotation
, OneHour
, OneDay
, OneWeek
, and OneMonth
.
âšī¸ Limits
This middleware forwards each discrete document events to OpenSearch using the Default Document Id Format, which means that Firehose will generate a new unique document ID for each record based on a unique internal identifier.
This identifier remains stable across delivery attempts. However if you resubmit the same document in the pipeline (having the same URI), Firehose will generate a new unique document identifier, resulting in the duplication of the document in the OpenSearch index.
Another limitation of this middleware is that it currently only supports OpenSearch domains, and not OpenSearch Serverless collections.
đī¸ Architecture
This middleware uses AWS Firehose to buffer incoming document events from other middlewares in a pipeline, and uses the AWS Firehose native integration with OpenSearch to index documents.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
*/* | This middleware supports any type of documents. |
Supported Outputs
This middleware does not produce any output.
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- Building a Document Index - End-to-end document metadata extraction with OpenSearch.