Skip to main content

Introduction

Data on Amazon EKS(DoEKS) - A tool for building aws managed and self-managed scalable data platforms on Amazon EKS. With DoEKS, You have access to:

  1. Robust Deployment Infrastructure as Code (IaC) Templates using Terraform and AWS CDK, among other
  2. Best Practices for Deploying Data Solutions on Amazon EKS
  3. Detailed Performance Benchmark Reports
  4. Hands-on Samples of Apache Spark/ML Jobs and various other frameworks
  5. In-depth Reference Architectures and Data Blogs to keep you ahead of the curve

Architecture

The diagram displays the open source data tools, k8s operators and frameworks that runs on Kubernetes covered in DoEKS. AWS Data Analytics managed services integration with Data on EKS OSS tools.

Data on EKS.png

Main Features

🚀 EMR on EKS

🚀 Open Source Spark on EKS

🚀 Custom Kubernetes Schedulers (e.g., Apache YuniKorn, Volcano)

🚀 Job Schedulers (e.g., Apache Airflow, Argo Workflows)

🚀 AI/ML on Kubernetes (e.g., KubeFlow, MLFlow, Tensorflow, PyTorch, etc.)

🚀 Distributed Databases (e.g., Cassandra, CockroachDB, MongoDB etc.)

🚀 Streaming Platforms (e.g., Apache Kafka, Apache Flink, Apache Beam etc.)

Getting Started

Checkout the documentation for each section to deploy infrastructure and run sample Spark/ML jobs.