Skip to main content
Data on EKS

Supercharge your Data Journey with Amazon EKS

The comprehensive set of tools for running data workloads on Amazon EKS.
Build, deploy, and scale your data infrastructure with confidence.

30+Ready-to-use Data Stacks
800+GitHub Stars
AWSOfficial Project

What is Data on EKS?

Data on EKS is an open-source, enterprise-ready framework for running scalable data platforms on Amazon EKS. It integrates open-source data tools with AWS infrastructure and offers Terraform and ArgoCD templates, proven blueprints, performance benchmarks, and best practices. Built for scale and resilience, it helps teams deploy and operate complex data workloads at thousands of nodes on EKS with confidence.

Powered by Open Source & AWS

Production-ready data stacks on Amazon EKS

Apache Spark

Processing

Spark Operator

Processing

Amazon EMR on EKS

Processing

Ray Data

Processing

Apache Kafka

Streaming

Apache Flink

Streaming

Flink Operator

Streaming

Apache Airflow

Orchestration

Argo Workflows

Orchestration

Amazon MWAA

Orchestration

Kubeflow

Orchestration

PostgreSQL

Databases

ClickHouse

Databases

Apache Pinot

Databases

Trino

Query Engines

Presto

Query Engines

Apache Superset

BI & Visualization

Grafana

BI & Visualization

JupyterHub

Notebooks & AI

Milvus

Notebooks & AI

Data Analytics

Transform your data with enterprise-grade analytics solutions. Deploy Apache Spark, Ray, Dask, and Jupyter environments with production-ready configurations. Scale from terabytes to petabytes with confidence using battle-tested architectures.

Learn more

Streaming Data Platforms

Build real-time data pipelines that never sleep. Process millions of events per second with Apache Kafka, Flink, and Kinesis. From IoT sensors to financial transactions, handle any streaming workload at any scale.

Learn more

Amazon EMR on EKS

Run enterprise-grade Spark workloads on Kubernetes with Amazon EMR on EKS. Get optimized Spark runtime, automatic scaling, simplified job management, and seamless integration with AWS services for faster, more cost-effective big data processing.

Learn more

Workflow Orchestration

Orchestrate complex data workflows with precision. Deploy Apache Airflow, Argo Workflows, and Amazon MWAA to automate ETL pipelines, ML training, and data quality checks. Never miss a dependency again.

Learn more

Distributed Databases & Query Engines

Query anything, anywhere, anytime. Deploy Trino, Presto, and ClickHouse for lightning-fast analytics across data lakes, warehouses, and real-time streams. Join data across 50+ sources in milliseconds.

Learn more

Featured Videos

AWS re:Invent 2023 - Data processing at massive scale on Amazon EKS

AWS re:Invent 2023 - Data processing at massive scale on Amazon EKS

Dec 4, 2023

Data processing at massive scale with Spark on Amazon EKS by Pinterest.

Containers from the Couch - Data on EKS (DoEKS)

Containers from the Couch - Data on EKS (DoEKS)

Sep 21, 2023

In this demo-focused livestream, learn how to run Spark and AI/ML workloads on Amazon EKS