GPU Cost Allocation from CUR
This page converts GPU-hours (from Data Extraction) into dollar costs using your actual instance pricing from the AWS Cost and Usage Report (CUR).
The Formula
GPU Hourly Rate = Instance Hourly Cost (from CUR) ÷ GPUs Per Instance
User Cost = User GPU-Hours (from sreport) × GPU Hourly Rate
Team Cost = Team GPU-Hours (from sreport) × GPU Hourly Rate
Project Cost = Project GPU-Hours (from sacct) × GPU Hourly Rate
How HyperPod Billing Works in CUR
SageMaker HyperPod compute is billed under AmazonSageMaker in CUR — not under AmazonEC2. The instance type is embedded in the usage type string as Cluster:ml.<type>.
| Component | CUR line_item_product_code | CUR line_item_usage_type |
|---|---|---|
| HyperPod compute nodes | AmazonSageMaker | <region>-Cluster:ml.p4d.24xlarge |
| HyperPod EBS volumes | AmazonSageMaker | <region>-Cluster:VolumeUsage.gp2 |
| FSx Lustre storage | AmazonFSx | Various |
| EKS control plane | AmazonEKS | <region>-AmazonEKS-Hours:perCluster |
Discover HyperPod Usage Types
Run this first to see what HyperPod line items exist in your CUR:
SELECT DISTINCT
line_item_product_code,
line_item_usage_type,
line_item_operation
FROM <YOUR_CUR_DATABASE>.<YOUR_CUR_TABLE>
WHERE line_item_product_code = 'AmazonSageMaker'
AND line_item_usage_type LIKE '%Cluster:%'
ORDER BY line_item_usage_type;
Get Effective Hourly Rate Per Instance Type
This query returns the actual rate you're paying, including any RI/SP/EDP discounts:
SELECT
REGEXP_EXTRACT(line_item_usage_type, 'Cluster:(.+)', 1) AS instance_type,
ROUND(SUM(line_item_unblended_cost) / SUM(line_item_usage_amount), 4) AS effective_hourly_rate,
ROUND(SUM(line_item_unblended_cost), 2) AS total_cost_this_month,
ROUND(SUM(line_item_usage_amount), 2) AS total_hours
FROM <YOUR_CUR_DATABASE>.<YOUR_CUR_TABLE>
WHERE line_item_product_code = 'AmazonSageMaker'
AND line_item_usage_type LIKE '%Cluster:ml.%'
AND line_item_usage_amount > 0
AND year = CAST(YEAR(CURRENT_DATE) AS VARCHAR)
AND month = CAST(MONTH(CURRENT_DATE) AS VARCHAR)
GROUP BY REGEXP_EXTRACT(line_item_usage_type, 'Cluster:(.+)', 1)
ORDER BY total_cost_this_month DESC;
Example output:
| instance_type | effective_hourly_rate | total_cost_this_month | total_hours |
|---|---|---|---|
| ml.p4d.24xlarge | 37.6880 | 27,324.32 | 725.00 |
| ml.p5.48xlarge | 98.3200 | 71,282.00 | 725.00 |
| ml.g5.12xlarge | 7.0900 | 1,142.71 | 161.17 |
GPU Reference Table
| Instance Type | GPUs | GPU Type | Memory/GPU |
|---|---|---|---|
| ml.p5.48xlarge | 8 | H100 80GB | 80 GB |
| ml.p4d.24xlarge | 8 | A100 40GB | 40 GB |
| ml.p4de.24xlarge | 8 | A100 80GB | 80 GB |
| ml.p3.16xlarge | 8 | V100 16GB | 16 GB |
| ml.p3.8xlarge | 4 | V100 16GB | 16 GB |
| ml.g5.48xlarge | 8 | A10G 24GB | 24 GB |
| ml.g5.12xlarge | 4 | A10G 24GB | 24 GB |
| ml.g6.12xlarge | 4 | L4 24GB | 24 GB |
Example: ml.p4d.24xlarge: $37.69/hr ÷ 8 GPUs = $4.71 per GPU-hour
Create Athena Cost Views
These views join your Slurm GPU-hours data with CUR-derived rates.
GPU Cost Rate Reference
Update values with your actual rates from the CUR query above:
CREATE OR REPLACE VIEW gpu_chargeback.gpu_cost_rates AS
SELECT * FROM (VALUES
('ml.p4d.24xlarge', 8, 37.6880, 37.6880 / 8),
('ml.p5.48xlarge', 8, 98.3200, 98.3200 / 8),
('ml.g5.12xlarge', 4, 7.0900, 7.0900 / 4)
) AS t(instance_type, gpus_per_instance, instance_hourly_cost, gpu_hourly_cost);
If your cluster uses one instance type, this is just one row. The gpu_hourly_cost column is what gets multiplied by GPU-hours.
Per-User Cost View
The cost views below use CROSS JOIN ... LIMIT 1 to pick one GPU cost rate from the gpu_cost_rates table. This assumes your cluster uses a single instance type. If you have multiple instance types, LIMIT 1 will silently pick an arbitrary row with no guarantee of which. For mixed-instance clusters, use the weighted average approach described at the bottom of this page instead.
CREATE OR REPLACE VIEW gpu_chargeback.user_cost AS
SELECT
au.account AS team,
au.login AS username,
au.proper AS display_name,
au.used AS gpu_hours,
cr.gpu_hourly_cost,
ROUND(au.used * cr.gpu_hourly_cost, 2) AS cost_usd,
au.period, au.year, au.month, au.day
FROM gpu_chargeback.account_utilization au
CROSS JOIN (
SELECT gpu_hourly_cost FROM gpu_chargeback.gpu_cost_rates LIMIT 1
) cr
WHERE au.used > 0;
Per-Team Cost View
CREATE OR REPLACE VIEW gpu_chargeback.team_cost AS
SELECT
team,
SUM(gpu_hours) AS total_gpu_hours,
gpu_hourly_cost,
ROUND(SUM(gpu_hours) * gpu_hourly_cost, 2) AS total_cost_usd,
COUNT(DISTINCT username) AS active_users,
period, year, month
FROM gpu_chargeback.user_cost
GROUP BY team, gpu_hourly_cost, period, year, month;
Per-Project Cost View
Parses project-id from the Comment field and computes GPU-hours per project:
CREATE OR REPLACE VIEW gpu_chargeback.project_cost AS
SELECT
account AS team,
"user" AS username,
regexp_extract(comment, 'project-id:([^,]+)', 1) AS project_id,
COUNT(*) AS job_count,
SUM(gpu_count) AS total_gpus_used,
ROUND(SUM(gpu_hours), 2) AS total_gpu_hours,
ROUND(SUM(gpu_hours) * cr.gpu_hourly_cost, 2) AS cost_usd,
period, year, month
FROM (
SELECT
account, "user", comment, period, year, month,
CAST(regexp_extract(alloctres, 'gres/gpu=(\d+)', 1) AS INTEGER) AS gpu_count,
CAST(regexp_extract(alloctres, 'gres/gpu=(\d+)', 1) AS INTEGER)
* (
COALESCE(CAST(regexp_extract(elapsed, '^(\d+)-', 1) AS DOUBLE), 0) * 24.0
+ COALESCE(CAST(regexp_extract(elapsed, '(\d+):(\d+):(\d+)$', 1) AS DOUBLE), 0)
+ COALESCE(CAST(regexp_extract(elapsed, '(\d+):(\d+):(\d+)$', 2) AS DOUBLE), 0) / 60.0
+ COALESCE(CAST(regexp_extract(elapsed, '(\d+):(\d+):(\d+)$', 3) AS DOUBLE), 0) / 3600.0
) AS gpu_hours
FROM gpu_chargeback.jobs_detailed
WHERE comment LIKE '%project-id:%'
AND state = 'COMPLETED'
AND alloctres LIKE '%gres/gpu=%'
) jobs
CROSS JOIN (
SELECT gpu_hourly_cost FROM gpu_chargeback.gpu_cost_rates LIMIT 1
) cr
WHERE gpu_hours > 0
GROUP BY account, "user", regexp_extract(comment, 'project-id:([^,]+)', 1),
cr.gpu_hourly_cost, period, year, month;
Run Cost Queries
-- Per-user cost
SELECT team, username, display_name, gpu_hours, cost_usd
FROM gpu_chargeback.user_cost
WHERE period = 'monthly'
ORDER BY cost_usd DESC;
-- Per-team cost
SELECT team, total_gpu_hours, total_cost_usd, active_users
FROM gpu_chargeback.team_cost
WHERE period = 'monthly'
ORDER BY total_cost_usd DESC;
-- Per-project cost
SELECT team, project_id, job_count, total_gpu_hours, cost_usd
FROM gpu_chargeback.project_cost
WHERE period = 'monthly'
ORDER BY cost_usd DESC;
Automated Cost Report Script
Create /fsx/ubuntu/slurmAccounting/scripts/generate_cost_reports.sh:
#!/bin/bash
set -e
PERIOD="${1:-monthly}"
REPORT_DATE=$(date +%Y-%m-%d)
OUTPUT_DIR="/fsx/ubuntu/slurmAccounting/reports/costReports"
S3_BUCKET="<YOUR_GPU_ACCOUNTING_BUCKET>"
ATHENA_DATABASE="gpu_chargeback"
ATHENA_RESULTS="s3://${S3_BUCKET}/athena-results/"
ATHENA_WORKGROUP="primary"
REGION="<YOUR_REGION>"
mkdir -p $OUTPUT_DIR
run_athena_query() {
local QUERY="$1"
local OUTPUT_FILE="$2"
QUERY_ID=$(aws athena start-query-execution \
--query-string "$QUERY" \
--query-execution-context Database=${ATHENA_DATABASE} \
--result-configuration OutputLocation=${ATHENA_RESULTS} \
--work-group ${ATHENA_WORKGROUP} \
--region ${REGION} \
--query 'QueryExecutionId' --output text)
for i in $(seq 1 60); do
STATE=$(aws athena get-query-execution --query-execution-id $QUERY_ID \
--region ${REGION} --query 'QueryExecution.Status.State' --output text)
[ "$STATE" = "SUCCEEDED" ] && break
[ "$STATE" = "FAILED" ] && echo "Query failed" && return 1
sleep 5
done
aws athena get-query-results --query-execution-id $QUERY_ID \
--region ${REGION} --output text > "$OUTPUT_FILE"
}
run_athena_query \
"SELECT team, username, display_name, gpu_hours, cost_usd FROM ${ATHENA_DATABASE}.user_cost WHERE period='${PERIOD}' ORDER BY cost_usd DESC" \
"$OUTPUT_DIR/user_cost_${PERIOD}_${REPORT_DATE}.csv"
run_athena_query \
"SELECT team, total_gpu_hours, total_cost_usd, active_users FROM ${ATHENA_DATABASE}.team_cost WHERE period='${PERIOD}' ORDER BY total_cost_usd DESC" \
"$OUTPUT_DIR/team_cost_${PERIOD}_${REPORT_DATE}.csv"
run_athena_query \
"SELECT team, project_id, username, job_count, total_gpu_hours, cost_usd FROM ${ATHENA_DATABASE}.project_cost WHERE period='${PERIOD}' ORDER BY cost_usd DESC" \
"$OUTPUT_DIR/project_cost_${PERIOD}_${REPORT_DATE}.csv"
# Upload to S3
YEAR=$(date +%Y); MONTH=$(date +%m); DAY=$(date +%d)
S3_PREFIX="s3://${S3_BUCKET}/cost-reports/period=${PERIOD}/year=${YEAR}/month=${MONTH}/day=${DAY}"
for file in $OUTPUT_DIR/*_${PERIOD}_${REPORT_DATE}.csv; do
[ -f "$file" ] && aws s3 cp "$file" "${S3_PREFIX}/$(basename $file)" --region ${REGION}
done
chmod +x /fsx/ubuntu/slurmAccounting/scripts/generate_cost_reports.sh
Schedule after GPU reports (add to existing crontab):
# Cost reports — 30 minutes after GPU reports
30 1 * * 1 /fsx/ubuntu/slurmAccounting/scripts/generate_cost_reports.sh weekly >> /var/log/slurm/cost_weekly.log 2>&1
30 2 1 * * /fsx/ubuntu/slurmAccounting/scripts/generate_cost_reports.sh monthly >> /var/log/slurm/cost_monthly.log 2>&1
Update Cost Rates
When your CUR rates change (new discounts, instance types), update the Athena view:
CREATE OR REPLACE VIEW gpu_chargeback.gpu_cost_rates AS
SELECT * FROM (VALUES
('ml.p4d.24xlarge', 8, 35.5000, 35.5000 / 8),
('ml.p5.48xlarge', 8, 98.3200, 98.3200 / 8),
('ml.g5.12xlarge', 4, 7.0900, 7.0900 / 4)
) AS t(instance_type, gpus_per_instance, instance_hourly_cost, gpu_hourly_cost);
If users run on different instance types and you can't determine which, use a weighted average:
SELECT ROUND(SUM(line_item_unblended_cost) / SUM(line_item_usage_amount), 4) AS blended_rate
FROM <YOUR_CUR_DATABASE>.<YOUR_CUR_TABLE>
WHERE line_item_product_code = 'AmazonSageMaker'
AND line_item_usage_type LIKE '%Cluster:ml.%'
AND line_item_usage_amount > 0;