Skip to main content
Data on EKS

Supercharge your Data Journey with Amazon EKS

The comprehensive set of tools for running data workloads on Amazon EKS.
Build, deploy, and scale your data infrastructure with confidence.

30+Ready-to-use Blueprints
700+GitHub Stars
AWSOfficial Project

What is Data on EKS?

Data on EKS is an open-source, enterprise-ready framework for running scalable data platforms on Amazon EKS. It integrates open-source data tools with AWS infrastructure and offers Terraform and ArgoCD templates, proven blueprints, performance benchmarks, and best practices. Built for scale and resilience, it helps teams deploy and operate complex data workloads at thousands of nodes on EKS with confidence.

Apache Spark
Apache Flink
Apache Kafka
Ray Data
Spark Operator
Flink Operator
Apache Airflow
Argo Workflows
Trino
ClickHouse
Apache Pinot
Apache Superset
Amazon EMR on EKS
Amazon EKS
AWS Batch
Amazon MWAA
Amazon Kinesis
Amazon S3

Data Analytics

Transform your data with enterprise-grade analytics solutions. Deploy Apache Spark, Ray, Dask, and Jupyter environments with production-ready configurations. Scale from terabytes to petabytes with confidence using battle-tested architectures.

Learn more

Streaming Data Platforms

Build real-time data pipelines that never sleep. Process millions of events per second with Apache Kafka, Flink, and Kinesis. From IoT sensors to financial transactions, handle any streaming workload at any scale.

Learn more

Amazon EMR on EKS

Run enterprise-grade Spark workloads on Kubernetes with Amazon EMR on EKS. Get optimized Spark runtime, automatic scaling, simplified job management, and seamless integration with AWS services for faster, more cost-effective big data processing.

Learn more

Workflow Orchestration

Orchestrate complex data workflows with precision. Deploy Apache Airflow, Argo Workflows, and Amazon MWAA to automate ETL pipelines, ML training, and data quality checks. Never miss a dependency again.

Learn more

Distributed Databases & Query Engines

Query anything, anywhere, anytime. Deploy Trino, Presto, and ClickHouse for lightning-fast analytics across data lakes, warehouses, and real-time streams. Join data across 50+ sources in milliseconds.

Learn more

Featured Videos

AWS re:Invent 2023 - Data processing at massive scale on Amazon EKS

AWS re:Invent 2023 - Data processing at massive scale on Amazon EKS

Dec 4, 2023

Data processing at massive scale with Spark on Amazon EKS by Pinterest.

Containers from the Couch - Data on EKS (DoEKS)

Containers from the Couch - Data on EKS (DoEKS)

Sep 21, 2023

In this demo-focused livestream, learn how to run Spark and AI/ML workloads on Amazon EKS