Skip to main content

Spark on EKS Stack

Production-ready Apache Spark examples and configurations for Amazon EKS. Choose from infrastructure deployment and advanced use cases.

Getting Started

1

Deploy Infrastructure

Start with the infrastructure deployment guide to set up your Spark on EKS foundation

2

Choose Your Use Case

Select the example that matches your storage and performance requirements

3

Follow Instructions

Each example provides step-by-step deployment and verification guides

4

Customize

Adapt the configurations for your specific workload and performance needs

💿

EBS Dynamic PVC Storage

Production-ready EBS Dynamic PVC with fault tolerance, PVC reuse, and automatic volume provisioning for Spark shuffle storage

StoragePerformance
💾

EBS Node Storage

Cost-effective shared EBS volume per node for Spark shuffle storage. ~70% cost reduction vs per-pod PVCs with potential noisy neighbor trade-offs

StorageOptimization

NVMe Instance Storage

Leverage instance store NVMe SSDs for maximum I/O performance and cost optimization with local data processing

StoragePerformance
🎯

YuniKorn Gang Scheduling

Apache YuniKorn gang scheduling ensures atomic resource allocation for Spark jobs. Prevents resource fragmentation and eliminates deadlocks

PerformanceOptimization
🗄️

Mountpoint for Amazon S3

High-performance file interface for S3 with native POSIX operations. Optimized for large-scale data processing workloads

StoragePerformance

S3 Express One Zone

Ultra-fast S3 storage class with single-digit millisecond latency. Purpose-built for high-performance analytics workloads

StoragePerformance
📊

Spark Observability

Production-grade monitoring with Prometheus, Grafana, and Spark History Server. Native PrometheusServlet metrics integration

InfrastructureGuide
🔄

Apache Beam Pipelines

Run portable Apache Beam pipelines on Spark. Write-once, run-anywhere for batch and streaming with unified programming model

PerformanceGuide
🌐

IPv6 Networking

Deploy Spark on an IPv6-enabled EKS cluster for modern cloud networking.

InfrastructureGuide