📄️ Trino on EKS Best Practices
Trino deployment on Amazon Elastic Kubernetes Service (EKS) delivers distributed query processing with cloud-native scalability. Organizations can optimize costs by selecting specific compute instances and storage solutions that match their workload requirements while they combine the power of Trino with the scalability and flexibility of EKS using Karpenter.
📄️ EMR on EKS Best Practices
EMR Containers Best Practices Guides
📄️ Spark on EKS Best Practices
This page aims to provide comprehensive best practices and guidelines for deploying, managing, and optimizing Apache Spark workloads on Amazon Elastic Kubernetes Service (EKS). This helps organizations to successfully run and scale their Spark Applications at scale in a containerised environment on Amazon EKS.
📄️ Apache Celeborn Best Practices
Apache Celeborn is a Remote Shuffle Service (RSS). It moves Spark shuffle data off executor disks onto dedicated worker nodes, which lets you use true dynamic allocation and removes the local-disk shuffle bottleneck at scale.
📄️ Preventing OOM Kills in Spark on EKS
Every organization running large scale Spark workloads on Kubernetes has dealt with this: a job runs for hours, processes terabytes of data, completes 80% of its work, and then executors start disappearing. No JVM exception. No heap dump. No warning in Spark UI. Just exit code 137 and hours of compute burned. The standard response is to throw more memory at it, bump memoryOverhead by another 10 GB, and hope for the best. That works until the next data spike.