Scheduler
The Scheduler trigger allows developers to schedule the execution of document processing pipelines on a recurring or one-time schedule.
đī¸ One-time Schedules
To use this middleware as part of your pipeline, you have to import it in your CDK stack and specify a Schedule Expression that defines when to schedule the pipeline.
âšī¸ In the below example, we schedule the pipeline to start in 24 hours.
import * as scheduler from '@aws-cdk/aws-scheduler-alpha';import { SchedulerEventTrigger } from '@project-lakechain/scheduler-event-trigger';import { CacheStorage } from '@project-lakechain/core';
/** * @returns a date object representing the date * and time 24 hours from now. */const tomorrow = (): Date => { const date = new Date(); date.setDate(date.getDate() + 1); return (date);};
class Stack extends cdk.Stack { constructor(scope: cdk.Construct, id: string) { // Create the Scheduler event trigger. const trigger = new SchedulerEventTrigger.Builder() .withScope(this) .withIdentifier('Trigger') .withCacheStorage(cache) .withScheduleExpression( scheduler.ScheduleExpression.at(tomorrow()) ) .build(); }}
đ Cron Expressions
You can also use cron expressions to define when to schedule a pipeline.
âšī¸ In the below example, we schedule the pipeline to run at 8:00PM UTC every Monday through Friday.
const trigger = new SchedulerEventTrigger.Builder() .withScope(this) .withIdentifier('Trigger') .withCacheStorage(cache) .withScheduleExpression( scheduler.ScheduleExpression.expression('cron(0 20 ? * MON-FRI *)') ) .build();
â° Rate Expressions
Another way to create a recurring schedule is to use a rate expression.
âšī¸ In the below example, we schedule the pipeline to run every 5 minutes.
const trigger = new SchedulerEventTrigger.Builder() .withScope(this) .withIdentifier('Trigger') .withCacheStorage(cache) .withScheduleExpression( scheduler.ScheduleExpression.rate(cdk.Duration.minutes(5)) ) .build();
â ī¸ Important Note
This middleware does not act as a data source, but rather as a simple trigger because it does not yield any document by default. However, every middleware must provide a valid document that can be interpreted by the next middlewares.
To solve this issue, the Scheduler will send a placeholder document with the application/json+scheduler
mime-type when the schedule is reached. This means that subsequent middlewares have to be explicitly configured to accept this mime-type, and know how to react when triggered by the scheduler.
Providing Documents
In some cases, it can be useful to provide input documents on a schedule. A common use-case for this is to create a recurrent pipeline that will fetch information from an external system on a regular basis.
The Scheduler API allows developers to pass an array of URIs identifying documents to use as an input for a scheduled pipeline using the withDocuments
method.
âšī¸ The below example will trigger the pipeline every 5 minutes, and create 2 different events for each document.
const trigger = new SchedulerEventTrigger.Builder() .withScope(this) .withIdentifier('Trigger') .withCacheStorage(cache) .withScheduleExpression( scheduler.ScheduleExpression.rate(cdk.Duration.minutes(5)) ) .withDocuments([ 'https://aws.amazon.com/builders-library/dependency-isolation/', 's3://my-bucket/my-document.json' ]) .build();
When specifying documents, the scheduler will attempt to infer their mime-type automatically. In the previous example, the scheduler would send a document with a mime-type of text/html
for the first document, and application/json
for the second.
đī¸ Architecture
The Scheduler trigger uses the AWS EventBridge Scheduler service to trigger a Lambda function. The Lambda function publishes the appropriate document to its output topic to be consumed by the next middleware in the pipeline.
đˇī¸ Properties
Supported Inputs
This middleware does not accept any inputs from other middlewares.
Supported Outputs
Mime Type | Description |
---|---|
Variant | When specifying documents to the Scheduler, it will attempt to infer the mime-types associated with these documents. If no documents are specified, the Scheduler will send a placeholder document to the next middlewares having a mime-type of application/json+scheduler . |
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware is based on a Lambda architecture. |
đ Examples
- Article Curation Pipeline - Builds a pipeline converting HTML articles into plain text.