Skip to main content

Job Tagging Strategy (Project-ID)

While Slurm accounts track which team uses GPUs, job tagging tracks which project or model version consumes capacity. This enables cost allocation per model version beyond team-level tracking.

How It Works

The --comment field in sbatch stores free-text metadata with each job. We use a structured format to embed the project identifier:

sbatch --account=team-a-research \
--comment="project-id:model-v2.1-pretrain" \
--gres=gpu:8 \
train.sh

The project-id is later extracted from Slurm's accounting database using sacct and parsed in Athena for cost attribution.

Project-ID Naming Convention

Recommended format: <product>-<version>-<stage>

TeamExample Project-IDs
Researchspeech-model-v1.0, nlp-bert-v2.0
Pretrainingllm-v2.1-pretrain, llm-v3.0-base
Post-trainingllm-v2.1-sft, llm-v2.1-rlhf
Evaluationllm-v2.1-eval, benchmark-2026q1
Data Pipelinespipeline-silver-v1.0, dataset-clean-v2
Consistency Matters

The --comment field is free-text — Slurm doesn't validate it. Consistent naming is critical for accurate downstream reporting. Use the wrapper script below to enforce it.

Job Submission Examples

# Standard submission
sbatch --account=team-b-pretraining \
--comment="project-id:llm-v2.1-pretrain" \
--gres=gpu:8 --time=72:00:00 \
train.sh

# With QoS
sbatch --account=team-a-research \
--qos=high \
--comment="project-id:speech-model-v1.0" \
--gres=gpu:4 --time=24:00:00 \
train_speech.sh

Submission Wrapper Script

Create /fsx/ubuntu/slurmAccounting/scripts/submit_job.sh to enforce account and project-id:

#!/bin/bash
set -e

usage() {
echo "Usage: $0 -a <account> -p <project-id> [-q <qos>] [sbatch options] <script>"
exit 1
}

ACCOUNT=""
PROJECT_ID=""
QOS=""
SBATCH_ARGS=()

while [[ $# -gt 0 ]]; do
case $1 in
-a|--account) ACCOUNT="$2"; shift 2 ;;
-p|--project-id) PROJECT_ID="$2"; shift 2 ;;
-q|--qos) QOS="$2"; shift 2 ;;
-h|--help) usage ;;
*) SBATCH_ARGS+=("$1"); shift ;;
esac
done

[ -z "$ACCOUNT" ] && echo "ERROR: --account is required" && usage
[ -z "$PROJECT_ID" ] && echo "ERROR: --project-id is required" && usage

COMMENT="project-id:${PROJECT_ID},submitted:$(date -u +%Y-%m-%dT%H:%M:%SZ)"
QOS_ARG=""
[ -n "$QOS" ] && QOS_ARG="--qos=$QOS"

sbatch --account="$ACCOUNT" --comment="$COMMENT" $QOS_ARG "${SBATCH_ARGS[@]}"

Make it available system-wide:

chmod +x /fsx/ubuntu/slurmAccounting/scripts/submit_job.sh
ln -s /fsx/ubuntu/slurmAccounting/scripts/submit_job.sh /usr/local/bin/submit_job

Usage:

submit_job -a team-b-pretraining -p llm-v2.1-pretrain --gres=gpu:8 train.sh
submit_job -a team-a-research -p speech-v1.0 -q high --gres=gpu:4 train_speech.sh

Verify Tags Are Stored

# After job starts
scontrol show job <JOB_ID> | grep Comment

# After job completes
sacct -j <JOB_ID> --format=JobID,Account,Comment

Query by Project-ID

Since sacct doesn't have a native --comment filter, use grep:

sacct -a -X -n -P --format=JobID,User,Account,Elapsed,Comment | \
grep "project-id:llm-v2.1"
Native vs Wrapper

Users can always use sbatch --comment="project-id:..." directly — the wrapper is optional convenience that enforces the convention. See Cost Allocation from CUR for how project-IDs are used in cost queries.