Spark on EKS Stack
Production-ready Apache Spark examples and configurations for Amazon EKS. Choose from infrastructure deployment and advanced use cases.
Getting Started
Deploy Infrastructure
Start with the infrastructure deployment guide to set up your Spark on EKS foundation
Choose Your Use Case
Select the example that matches your storage and performance requirements
Follow Instructions
Each example provides step-by-step deployment and verification guides
Customize
Adapt the configurations for your specific workload and performance needs
Infrastructure Deployment
Complete infrastructure deployment guide with configuration options and customization for Spark on EKS
EBS Dynamic PVC Storage
Production-ready EBS Dynamic PVC with fault tolerance, PVC reuse, and automatic volume provisioning for Spark shuffle storage
EBS Node Storage
Cost-effective shared EBS volume per node for Spark shuffle storage. ~70% cost reduction vs per-pod PVCs with potential noisy neighbor trade-offs
NVMe Instance Storage
Leverage instance store NVMe SSDs for maximum I/O performance and cost optimization with local data processing
Graviton NVMe Storage
ARM64 Graviton processors with NVMe SSDs for superior price-performance. Up to 40% cost savings with maximum I/O performance
YuniKorn Gang Scheduling
Apache YuniKorn gang scheduling ensures atomic resource allocation for Spark jobs. Prevents resource fragmentation and eliminates deadlocks
Mountpoint for Amazon S3
High-performance file interface for S3 with native POSIX operations. Optimized for large-scale data processing workloads
S3 Express One Zone
Ultra-fast S3 storage class with single-digit millisecond latency. Purpose-built for high-performance analytics workloads
S3 Tables with Iceberg
Step-by-step deployment of S3 Tables with Spark. Includes ACID transactions, time travel, schema evolution, and JupyterHub integration
Spark Observability
Production-grade monitoring with Prometheus, Grafana, and Spark History Server. Native PrometheusServlet metrics integration
Apache Beam Pipelines
Run portable Apache Beam pipelines on Spark. Write-once, run-anywhere for batch and streaming with unified programming model
IPv6 Networking
Deploy Spark on an IPv6-enabled EKS cluster for modern cloud networking.