Observability
Observability is a foundational element of a well-architected EKS/Slurm environment. AWS provides native (CloudWatch) and open source managed (Amazon Managed Service for Prometheus (AMP), Amazon Managed Grafana (AMG) and AWS Distro for OpenTelemetry) solutions for monitoring, logging and alarming of EKS environments.
Amazon SageMaker HyperPod can optionally be integrated with Amazon Managed Prometheus and Amazon Managed Grafana to export metrics about your cluster and cluster-nodes to an Amazon Managed Grafana dashboard.
In this section, we will specifically cover: