📄️ Introduction
Running data analytics tools on Kubernetes can provide a number of benefits for organizations looking to extract insights from large and complex data sets. Tools such as Apache Spark and DASK are designed to run on a cluster of machines, making them well-suited for deployment on Kubernetes.
📄️ Spark Operator with YuniKorn
Introduction
📄️ Spark Operator on EKS with IPv6
This example showcases the usage of Spark Operator running on Amazon EKS in IPv6 mode. the idea is to show and demonstarte running spark workloads on EKS IPv6 cluster.
📄️ S3 Tables with EKS
s3tables
📄️ Spark Observability on EKS
Introduction
📄️ Apache Beam on EKS
Apache Beam (Beam) is a flexible programming model for building batch and streaming data processing pipelines. With Beam, developers can write code once and run it on various execution engines, such as Apache Spark and Apache Flink. This flexibility allows organizations to leverage the strengths of different execution engines while maintaining a consistent codebase, reducing the complexity of managing multiple codebases and minimizing the risk of vendor lock-in.
📄️ DataHub on EKS
Introduction
📄️ Ray Data on EKS
What is Ray Data?
📄️ Superset on EKS
Introduction