EMR Runtime with Spark Operator

Introduction

In this post, we will learn to deploy EKS with EMR Spark Operator and execute sample Spark job with EMR runtime.

In this example, you will provision the following resources required to run Spark Applications using the Spark Operator and EMR runtime.

Creates EKS Cluster Control plane with public endpoint (for demo purpose only)
Two managed node groups
- Core Node group with 3 AZs for running system critical pods. e.g., Cluster Autoscaler, CoreDNS, Observability, Logging etc.
- Spark Node group with single AZ for running Spark jobs
Creates one Data team (emr-data-team-a)
- Creates new namespace for the team
- New IAM role for the team execution role
IAM policy for emr-data-team-a
Spark History Server Live UI is configured for monitoring running Spark jobs through an NLB and NGINX ingress controller
Deploys the following Kubernetes Add-ons
- Managed Add-ons
  - VPC CNI, CoreDNS, KubeProxy, AWS EBS CSi Driver
- Self Managed Add-ons
  - Metrics server with HA, CoreDNS Cluster proportional Autoscaler, Cluster Autoscaler, Prometheus Server and Node Exporter, AWS for FluentBit, CloudWatchMetrics for EKS

👈

👈

👈

👈

caution

To avoid unwanted charges to your AWS account, delete all the AWS resources created during this deployment