Skip to main content

Overview & Architecture

What We're Building

This guide deploys Kubecost v2.8.4 on a SageMaker HyperPod EKS cluster with:

  • Accurate per-node pricing derived from your actual CUR billing data (not public on-demand rates)
  • Per-pod cost allocation by project, team, and custom labels
  • Total cluster cost visibility — compute, storage (EBS, FSx), EKS fee, data transfer
  • Enforced tagging so every workload is attributed to a team and project

How It Works (Architecture)

CUR (AmazonSageMaker billing) ──► Athena ──► Effective Hourly Rates

CSV Pricing File

ConfigMap (K8s)

┌──── Kubecost v2.8.4 on HyperPod EKS ──┤
│ │
│ Prometheus ──► Cost Model ◄──── CSV Pricing
│ (k8s metrics) (usage × price)
│ │
│ ▼
│ ETL pipeline
│ │
│ ▼
│ Allocations (pod/team/project costs)
│ Assets (node/disk costs)
│ Cloud Cost (FSx, S3, EKS fee from CUR)
│ Collections, Anomaly Detection
└────────────────────────────────────────

Why CSV Pricing Is Required for HyperPod

warning

SageMaker HyperPod compute is billed in CUR under AmazonSageMaker with usage types like Cluster:ml.g5.12xlargeNOT under AmazonEC2 with i-* instance IDs. Kubecost's built-in CUR reconciliation matches nodes by EC2 instance ID, which doesn't exist for HyperPod.

Result: Kubecost's automatic CUR node reconciliation does not work for HyperPod. CSV custom pricing is the primary mechanism for accurate node-level costs.

Solution: We derive effective hourly rates from CUR via Athena and load them into Kubecost as CSV pricing. CUR still provides the Cloud Cost page (total SageMaker/FSx/S3 costs).

HyperPod-Specific Considerations

AspectHyperPod DifferenceImpact
Billing serviceAmazonSageMaker (not AmazonEC2)CSV pricing required for node costs
Node naminghyperpod-i-xxxxxxxxKubecost maps via providerID
EBS volumesRequires sagemaker:AttachClusterNodeVolumeExtra IAM permissions
Dashboard accessClusterIP + port-forwardNo public LB needed; use kubectl port-forward

Prerequisites & Tools

Required CLI Tools

aws --version        # v2.x required
kubectl version --client # v1.25+ (v2.x requires K8s 1.25+)
helm version # v3.12+
eksctl version # v0.150+
jq --version # v1.6+
docker --version # For ECR login
note

Kubecost v2.x requires Kubernetes 1.25 or later. Verify your HyperPod EKS cluster version:

kubectl version --client --output=yaml | grep gitVersion

AWS Permissions

Your IAM user/role needs: IAM (create policies/roles), EKS (describe/addons), S3 (CUR + Athena buckets), Athena (queries), Glue (catalog), Cost Explorer (billing), EC2 (describe), SageMaker (AttachClusterNodeVolume), CloudFormation (describe stacks).

CUR Requirements

You need at least one CUR configured:

SettingRequired Value
FormatParquet
Athena integrationEnabled
Resource IDsEnabled
GranularityHourly (preferred) or Daily