Scheduler
The Scheduler trigger allows developers to schedule the execution of document processing pipelines on a recurring or one-time schedule.
đī¸ One-time Schedules
To use this middleware as part of your pipeline, you have to import it in your CDK stack and specify a Schedule Expression that defines when to schedule the pipeline.
âšī¸ In the below example, we schedule the pipeline to start in 24 hours.
đ Cron Expressions
You can also use cron expressions to define when to schedule a pipeline.
âšī¸ In the below example, we schedule the pipeline to run at 8:00PM UTC every Monday through Friday.
â° Rate Expressions
Another way to create a recurring schedule is to use a rate expression.
âšī¸ In the below example, we schedule the pipeline to run every 5 minutes.
â ī¸ Important Note
This middleware does not act as a data source, but rather as a simple trigger because it does not yield any document by default. However, every middleware must provide a valid document that can be interpreted by the next middlewares.
To solve this issue, the Scheduler will send a placeholder document with the application/json+scheduler
mime-type when the schedule is reached. This means that subsequent middlewares have to be explicitly configured to accept this mime-type, and know how to react when triggered by the scheduler.
Providing Documents
In some cases, it can be useful to provide input documents on a schedule. A common use-case for this is to create a recurrent pipeline that will fetch information from an external system on a regular basis.
The Scheduler API allows developers to pass an array of URIs identifying documents to use as an input for a scheduled pipeline using the withDocuments
method.
âšī¸ The below example will trigger the pipeline every 5 minutes, and create 2 different events for each document.
When specifying documents, the scheduler will attempt to infer their mime-type automatically. In the previous example, the scheduler would send a document with a mime-type of text/html
for the first document, and application/json
for the second.
đī¸ Architecture
The Scheduler trigger uses the AWS EventBridge Scheduler service to trigger a Lambda function. The Lambda function publishes the appropriate document to its output topic to be consumed by the next middleware in the pipeline.
đˇī¸ Properties
Supported Inputs
This middleware does not accept any inputs from other middlewares.
Supported Outputs
Mime Type | Description |
---|---|
Variant | When specifying documents to the Scheduler, it will attempt to infer the mime-types associated with these documents. If no documents are specified, the Scheduler will send a placeholder document to the next middlewares having a mime-type of application/json+scheduler . |
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware is based on a Lambda architecture. |
đ Examples
- Article Curation Pipeline - Builds a pipeline converting HTML articles into plain text.