Multi-Tenant Slurm on AWS ParallelCluster, Part 1: Accounting Database + Multi-User Setup

A shared GPU cluster without enforcement is a queue with a tragedy of the commons baked in. One researcher launches a 64-GPU sweep at 9pm; the on-call engineer can’t get a 1-GPU interactive session for debugging the next morning; a third user’s batch job — submitted a week ago and patiently waiting — keeps getting shuffled to the back because nothing prevents the latest submissions from front-running it. The scheduler is doing exactly what it was configured to do: nothing about it. ...

May 16, 2026 · 23 min · Keita Watanabe