Skip to main content

Spark EMR Serverless job

An Amazon EMR Serverless Spark job orchestrated through AWS Step Functions state machine.

Overview

The construct creates an AWS Step Functions state machine that is used to submit a Spark job and orchestrate the lifecycle of the job. The construct leverages the AWS SDK service integrations to submit the jobs. The state machine can take a cron expression to trigger the job at a given interval. The schema below shows the state machine:

Spark EMR Serverless Job

Usage

The example stack below shows how to use EmrServerlessSparkJob construct. The stack also contains a SparkEmrServerlessRuntime to show how to create an EMR Serverless Application and pass it as an argument to the Spark job and use it as a runtime for the job.

class ExampleSparkJobEmrServerlessStack extends cdk.Stack {
constructor(scope: Construct, id: string) {
super(scope, id);

const nightJob = new dsf.processing.SparkEmrServerlessJob(this, 'PiJob', {
applicationId: runtime.application.attrApplicationId,
name: 'PiCalculation',
executionRole: executionRole,
executionTimeout: cdk.Duration.minutes(15),
s3LogBucket: Bucket.fromBucketName(this, 'LogBucket', 'emr-job-logs-EXAMPLE'),
s3LogPrefix: 'logs',
sparkSubmitEntryPoint: 'local:///usr/lib/spark/examples/src/main/python/pi.py',
});

new CfnOutput(this, 'job-state-machine', {
value: nightJob.stateMachine!.stateMachineArn,
});

Using the EMR Serverless StartJobRun parameters

The SparkEmrServerlessJobProps interface provides a simple abstraction to create an EMR Serverless Job. For deeper control on the job configuration, you can also use the SparkEmrServerlessJobApiProps inteface which provide the same interface as the StartJobRun API from EMR Serverless.