Launch
Build Production-Grade Data Platforms in Minutes
Deploy enterprise-scale data infrastructure on Amazon EKS with CNCF-native patterns, battle-tested configurations, and day-1 operational readiness.
What is Data on EKS?
Data on EKS (DoEKS) is an opinionated infrastructure framework that combines the power of Cloud Native Computing Foundation (CNCF) projects with deep AWS integration to deliver production-ready data platforms on Amazon EKS.
CNCF Ecosystem
AWS Integration
Infrastructure as Code
Production Patterns
Design Principles
Opinionated, Not Prescriptive
Pre-configured with sane defaults for 80% use cases. Fully customizable via Terraform variables and Helm values. Modular architecture - adopt incrementally.
Security by Default
IAM Roles for Service Accounts (IRSA) for pod-level permissions. Private VPC with NAT gateways, security groups. Encryption at rest (EBS/EFS with KMS).
Cost-Optimized
Karpenter for intelligent node provisioning (Spot + On-Demand). Right-sized instance recommendations. Auto-scaling with predictive capacity planning.
Observable from Day 1
Prometheus + Grafana with pre-built dashboards. Application-specific metrics (Spark, Flink, Kafka). CloudWatch integration for AWS-native tooling.
Layered Architecture Pattern
Data Stacks (Application Layer)
Spark, Kafka, Airflow configurations • Custom Helm values • Job examples and blueprints
Infrastructure Modules (Platform Layer)
Terraform modules (/infra/terraform/) • EKS, VPC, IAM, Karpenter • ArgoCD addon deployment
AWS Foundation (Cloud Layer)
Amazon EKS managed control plane • EC2, VPC, S3, CloudWatch • IAM, KMS, Secrets Manager
Data Stacks Catalog
Production-ready data platforms with complete deployment guides, examples, and best practices
Processing
Streaming
Orchestration
Quick Start: Deploy in 15 Minutes
Prerequisites
Required Tools:
# Verify installations
aws --version # AWS CLI v2.x
terraform --version # Terraform >= 1.0
kubectl version # kubectl >= 1.28
helm version # Helm >= 3.0
AWS Setup:
- IAM permissions:
AdministratorAccessor custom policy for VPC/EKS/IAM creation - Configured AWS profile:
aws configureorAWS_PROFILEenvironment variable - Default region set (recommend:
us-west-2,us-east-1,eu-west-1)
Choose Your Data Stack
| Need | Recommended Stack |
|---|---|
| Batch ETL / data processing | Spark on EKS |
| Real-time event streaming | Kafka on EKS |
| Workflow orchestration | Airflow on EKS |
| AWS-managed Spark | EMR on EKS |
| Vector databases & AI agents | AI for Data |
Clone & Configure
# Clone repository
git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/data-stacks/spark-on-eks
# Review configuration
cat terraform/data-stack.tfvars
Key Configuration Options:
# terraform/data-stack.tfvars
# Required
region = "us-west-2" # AWS region
name = "spark-on-eks" # Cluster name (must be unique)
# Core addons (recommended for all stacks)
enable_karpenter = true # Node autoscaling
enable_aws_load_balancer_controller = true # ALB/NLB support
enable_kube_prometheus_stack = true # Monitoring
# Spark-specific addons
enable_spark_operator = true # Spark Operator (Kubeflow)
enable_spark_history_server = true # Spark UI persistence
enable_yunikorn = true # Advanced scheduling
Deploy Infrastructure
# Automated deployment (validates prerequisites)
./deploy.sh
# Manual deployment (for advanced users)
cd terraform
terraform init
terraform plan -var-file=data-stack.tfvars
terraform apply -var-file=data-stack.tfvars -auto-approve
Deployment Timeline:
- Terraform apply: ~10-12 minutes (VPC, EKS, IAM, Karpenter)
- ArgoCD sync: ~3-5 minutes (Kubernetes addons)
- Total: ~15 minutes for full stack
Validate Deployment
# Configure kubectl
export CLUSTER_NAME=spark-on-eks
export AWS_REGION=us-west-2
aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME
# Verify nodes (Karpenter-managed)
kubectl get nodes
# Check addon deployments
kubectl get pods -A
# Verify Spark Operator
kubectl get crd sparkapplications.sparkoperator.k8s.io
kubectl get pods -n spark-operator
Run Example Workload
# Submit Spark Pi calculation job
kubectl apply -f examples/pyspark-pi-job.yaml
# Watch job progress
kubectl get sparkapplications -w
# View driver logs
kubectl logs spark-pi-driver
# Check Spark History Server (if enabled)
kubectl port-forward -n spark-operator svc/spark-history-server 18080:80
# Open: http://localhost:18080
CNCF Ecosystem Integration
Data on EKS is CNCF-native, using cloud-native patterns while optimizing for AWS:
| CNCF Project | Maturity | Role in DoEKS | AWS Alternative |
|---|---|---|---|
| Kubernetes | Graduated | Container orchestration | Amazon EKS (managed K8s) |
| Prometheus | Graduated | Metrics & alerting | Amazon Managed Prometheus (AMP) |
| Strimzi | Incubating | Kafka operator | Amazon MSK (managed Kafka) |
| Argo (CD/Workflows) | Graduated | GitOps & pipelines | AWS CodePipeline |
| Helm | Graduated | Package management | - |
| Karpenter | Graduated | Node autoscaling | Cluster Autoscaler |
| Grafana | Observability partner | Visualization | Amazon Managed Grafana (AMG) |
Why CNCF + AWS?
Portability
Run anywhere Kubernetes runs (on-prem, multi-cloud)
Community Innovation
Benefit from thousands of contributors
AWS Optimization
Tight integration with EKS, S3, IAM, CloudWatch
Hybrid Workloads
Mix open-source (Spark) + managed (EMR on EKS)
Production Deployment Patterns
Multi-Environment Strategy
Development
- Small instances
- 100% Spot instances
- Minimal addons
- Single AZ
Staging
- Production-like size
- 70% Spot / 30% On-Demand
- Full addons enabled
- Multi-AZ
Production
- Right-sized instances
- 70% Spot / 30% On-Demand
- HA, observability, security
- Multi-AZ with consolidation
Cost Optimization
Karpenter Best Practices:
- Mix Spot (70%) + On-Demand (30%) for fault tolerance
- Use multiple instance families (M5, M6i, M6a) for Spot diversity
- Enable consolidation to reduce idle capacity
- Set appropriate limits per NodePool
Savings Realized:
- Karpenter vs Cluster Autoscaler: ~30% reduction in node costs
- Spot instances: ~70% savings vs On-Demand
- EBS gp3 vs gp2: ~20% savings on storage
Learning Resources
Tutorials
Step-by-step guides for deploying data platforms
Deep Dives
Advanced topics for production optimization
Benchmarks
Real-world performance testing results
Ready to Build Your Data Platform?
Join thousands of data engineers using Data on EKS to run production workloads on Amazon EKS. Deploy your first stack in 15 minutes.
Community & Support
Documentation
Comprehensive guides, tutorials, and API reference
GitHub Issues
Bug reports, feature requests, and discussions
Discussions
Q&A, show-and-tell, and community ideas
Built with ❤️ by AWS Solutions Architects and Community Contributors
Data on EKS is an open-source project maintained by AWS community. Support is provided on a best-effort basis. This is not an official AWS service.
License: Apache 2.0 | Version: 2.0 (Current) | Last Updated: January 2025