Multi-Tenant Slurm on AWS ParallelCluster, Part 2: QoS Deep Dive
Part 1 of this series stood up an AWS ParallelCluster wired to an Aurora-backed Slurm accounting database, plus three users (alice, bob, charlie) sharing consistent UIDs across the head node and every compute node. Every job is now landing in slurmdbd with full attribution — but nothing is enforcing how the cluster gets shared. Whoever submits first wins, GPUs at any cost, 72-hour wall times, no per-team caps. That’s the gap Slurm’s Quality of Service (QoS) layer fills. ...