Spark on EKS Stack

Production-ready Apache Spark examples and configurations for Amazon EKS. Choose from infrastructure deployment and advanced use cases.

Getting Started

Deploy Infrastructure

Start with the infrastructure deployment guide to set up your Spark on EKS foundation

Choose Your Use Case

Select the example that matches your storage and performance requirements

Follow Instructions

Each example provides step-by-step deployment and verification guides

Customize

Adapt the configurations for your specific workload and performance needs

🏗️

Infrastructure Deployment

Complete infrastructure deployment guide with configuration options and customization for Spark on EKS

InfrastructureGuide

💿

EBS Dynamic PVC Storage

Production-ready EBS Dynamic PVC with fault tolerance, PVC reuse, and automatic volume provisioning for Spark shuffle storage

StoragePerformance

💾

EBS Node Storage

Cost-effective shared EBS volume per node for Spark shuffle storage. ~70% cost reduction vs per-pod PVCs with potential noisy neighbor trade-offs

StorageOptimization

⚡

NVMe Instance Storage

Leverage instance store NVMe SSDs for maximum I/O performance and cost optimization with local data processing

StoragePerformance

🚀

Graviton NVMe Storage

ARM64 Graviton processors with NVMe SSDs for superior price-performance. Up to 40% cost savings with maximum I/O performance

StoragePerformanceOptimization

🎯

YuniKorn Gang Scheduling

Apache YuniKorn gang scheduling ensures atomic resource allocation for Spark jobs. Prevents resource fragmentation and eliminates deadlocks

PerformanceOptimization

🗄️

Mountpoint for Amazon S3

High-performance file interface for S3 with native POSIX operations. Optimized for large-scale data processing workloads

StoragePerformance

⚡

S3 Express One Zone

Ultra-fast S3 storage class with single-digit millisecond latency. Purpose-built for high-performance analytics workloads

StoragePerformance

📊

S3 Tables with Iceberg

Step-by-step deployment of S3 Tables with Spark. Includes ACID transactions, time travel, schema evolution, and JupyterHub integration

StorageGuideOptimization

📊

Spark Observability

Production-grade monitoring with Prometheus, Grafana, and Spark History Server. Native PrometheusServlet metrics integration

InfrastructureGuide

🔄

Apache Beam Pipelines

Run portable Apache Beam pipelines on Spark. Write-once, run-anywhere for batch and streaming with unified programming model

PerformanceGuide

🌐

IPv6 Networking

Deploy Spark on an IPv6-enabled EKS cluster for modern cloud networking.

InfrastructureGuide

Getting Started​

Deploy Infrastructure

Choose Your Use Case

Follow Instructions

Customize

Infrastructure Deployment

EBS Dynamic PVC Storage

EBS Node Storage

NVMe Instance Storage

Graviton NVMe Storage

YuniKorn Gang Scheduling

Mountpoint for Amazon S3

S3 Express One Zone

S3 Tables with Iceberg

Spark Observability

Apache Beam Pipelines

IPv6 Networking

Getting Started