Skip to main content

Launch

Build Production-Grade Data Platforms in Minutes

Deploy enterprise-scale data infrastructure on Amazon EKS with CNCF-native patterns, battle-tested configurations, and day-1 operational readiness.

OpinionatedSecure by DefaultCost-OptimizedObservableCNCF-Native

What is Data on EKS?

Data on EKS (DoEKS) is an opinionated infrastructure framework that combines the power of Cloud Native Computing Foundation (CNCF) projects with deep AWS integration to deliver production-ready data platforms on Amazon EKS.

CNCF Ecosystem

Leverages graduated and incubating CNCF projects (Kubernetes, Prometheus, Strimzi, Argo)

AWS Integration

Deep integration with Amazon EKS, S3, EMR, and AWS data services

Infrastructure as Code

Terraform-based, GitOps-ready deployments with ArgoCD

Production Patterns

Battle-tested configurations from real AWS customer workloads

Design Principles

Opinionated, Not Prescriptive

Pre-configured with sane defaults for 80% use cases. Fully customizable via Terraform variables and Helm values. Modular architecture - adopt incrementally.

Security by Default

IAM Roles for Service Accounts (IRSA) for pod-level permissions. Private VPC with NAT gateways, security groups. Encryption at rest (EBS/EFS with KMS).

Cost-Optimized

Karpenter for intelligent node provisioning (Spot + On-Demand). Right-sized instance recommendations. Auto-scaling with predictive capacity planning.

Observable from Day 1

Prometheus + Grafana with pre-built dashboards. Application-specific metrics (Spark, Flink, Kafka). CloudWatch integration for AWS-native tooling.

Layered Architecture Pattern

Data Stacks (Application Layer)

Spark, Kafka, Airflow configurations • Custom Helm values • Job examples and blueprints

Infrastructure Modules (Platform Layer)

Terraform modules (/infra/terraform/) • EKS, VPC, IAM, Karpenter • ArgoCD addon deployment

AWS Foundation (Cloud Layer)

Amazon EKS managed control plane • EC2, VPC, S3, CloudWatch • IAM, KMS, Secrets Manager

Quick Start: Deploy in 15 Minutes

1

Prerequisites

Required Tools:

# Verify installations
aws --version # AWS CLI v2.x
terraform --version # Terraform >= 1.0
kubectl version # kubectl >= 1.28
helm version # Helm >= 3.0

AWS Setup:

  • IAM permissions: AdministratorAccess or custom policy for VPC/EKS/IAM creation
  • Configured AWS profile: aws configure or AWS_PROFILE environment variable
  • Default region set (recommend: us-west-2, us-east-1, eu-west-1)
2

Choose Your Data Stack

NeedRecommended Stack
Batch ETL / data processingSpark on EKS
Real-time event streamingKafka on EKS
Workflow orchestrationAirflow on EKS
AWS-managed SparkEMR on EKS
Vector databases & AI agentsAI for Data
3

Clone & Configure

# Clone repository
git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/data-stacks/spark-on-eks

# Review configuration
cat terraform/data-stack.tfvars

Key Configuration Options:

# terraform/data-stack.tfvars

# Required
region = "us-west-2" # AWS region
name = "spark-on-eks" # Cluster name (must be unique)

# Core addons (recommended for all stacks)
enable_karpenter = true # Node autoscaling
enable_aws_load_balancer_controller = true # ALB/NLB support
enable_kube_prometheus_stack = true # Monitoring

# Spark-specific addons
enable_spark_operator = true # Spark Operator (Kubeflow)
enable_spark_history_server = true # Spark UI persistence
enable_yunikorn = true # Advanced scheduling
4

Deploy Infrastructure

# Automated deployment (validates prerequisites)
./deploy.sh

# Manual deployment (for advanced users)
cd terraform
terraform init
terraform plan -var-file=data-stack.tfvars
terraform apply -var-file=data-stack.tfvars -auto-approve

Deployment Timeline:

  • Terraform apply: ~10-12 minutes (VPC, EKS, IAM, Karpenter)
  • ArgoCD sync: ~3-5 minutes (Kubernetes addons)
  • Total: ~15 minutes for full stack
5

Validate Deployment

# Configure kubectl
export CLUSTER_NAME=spark-on-eks
export AWS_REGION=us-west-2
aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME

# Verify nodes (Karpenter-managed)
kubectl get nodes

# Check addon deployments
kubectl get pods -A

# Verify Spark Operator
kubectl get crd sparkapplications.sparkoperator.k8s.io
kubectl get pods -n spark-operator
6

Run Example Workload

# Submit Spark Pi calculation job
kubectl apply -f examples/pyspark-pi-job.yaml

# Watch job progress
kubectl get sparkapplications -w

# View driver logs
kubectl logs spark-pi-driver

# Check Spark History Server (if enabled)
kubectl port-forward -n spark-operator svc/spark-history-server 18080:80
# Open: http://localhost:18080

CNCF Ecosystem Integration

Data on EKS is CNCF-native, using cloud-native patterns while optimizing for AWS:

CNCF ProjectMaturityRole in DoEKSAWS Alternative
KubernetesGraduatedContainer orchestrationAmazon EKS (managed K8s)
PrometheusGraduatedMetrics & alertingAmazon Managed Prometheus (AMP)
StrimziIncubatingKafka operatorAmazon MSK (managed Kafka)
Argo (CD/Workflows)GraduatedGitOps & pipelinesAWS CodePipeline
HelmGraduatedPackage management-
KarpenterGraduatedNode autoscalingCluster Autoscaler
GrafanaObservability partnerVisualizationAmazon Managed Grafana (AMG)

Why CNCF + AWS?

Portability

Run anywhere Kubernetes runs (on-prem, multi-cloud)

Community Innovation

Benefit from thousands of contributors

AWS Optimization

Tight integration with EKS, S3, IAM, CloudWatch

Hybrid Workloads

Mix open-source (Spark) + managed (EMR on EKS)

Production Deployment Patterns

Multi-Environment Strategy

Development

  • Small instances
  • 100% Spot instances
  • Minimal addons
  • Single AZ

Staging

  • Production-like size
  • 70% Spot / 30% On-Demand
  • Full addons enabled
  • Multi-AZ

Production

  • Right-sized instances
  • 70% Spot / 30% On-Demand
  • HA, observability, security
  • Multi-AZ with consolidation

Cost Optimization

Karpenter Best Practices:

  • Mix Spot (70%) + On-Demand (30%) for fault tolerance
  • Use multiple instance families (M5, M6i, M6a) for Spot diversity
  • Enable consolidation to reduce idle capacity
  • Set appropriate limits per NodePool

Savings Realized:

  • Karpenter vs Cluster Autoscaler: ~30% reduction in node costs
  • Spot instances: ~70% savings vs On-Demand
  • EBS gp3 vs gp2: ~20% savings on storage

Learning Resources

Ready to Build Your Data Platform?

Join thousands of data engineers using Data on EKS to run production workloads on Amazon EKS. Deploy your first stack in 15 minutes.

Community & Support


Built with ❤️ by AWS Solutions Architects and Community Contributors

Data on EKS is an open-source project maintained by AWS community. Support is provided on a best-effort basis. This is not an official AWS service.

License: Apache 2.0 | Version: 2.0 (Current) | Last Updated: January 2025