📄️ Introduction
Running data analytics tools on Kubernetes can provide a number of benefits for organizations looking to extract insights from large and complex data sets. Tools such as Apache Spark and DASK are designed to run on a cluster of machines, making them well-suited for deployment on Kubernetes.
📄️ Spark Operator with YuniKorn
Introduction
📄️ S3 Tables with EKS
s3tables
📄️ Spark Observability on EKS
Introduction
📄️ Apache Beam on EKS
Apache Beam (Beam) is a flexible programming model for building batch and streaming data processing pipelines. With Beam, developers can write code once and run it on various execution engines, such as Apache Spark and Apache Flink. This flexibility allows organizations to leverage the strengths of different execution engines while maintaining a consistent codebase, reducing the complexity of managing multiple codebases and minimizing the risk of vendor lock-in.
📄️ Ray Data on EKS
What is Ray Data?