Skip to content

AWS Parallel Computing Service (PCS)

Preview

AWS PCS on SOCA has been added since 25.11.0 through a slurm connector and is considered experimental. SOCA does not manage compute provisioning for PCS and only provide a client interface to view/edit/submit/delete jobs to your PCS cluster using CLI, Web Interface or REST API . Refer to Feature Matrix page to see what features are currently supported.

SOCA does not handle PCS installation yet. Instead, we do provide documentation to connect SOCA to one or more existing PCS environment(s)

Configure SOCA installer to deploy Slurm

Although not necessary, it's recommended to enable Slurm on SOCA. This action will deploy a slurmctld server on the SOCA Controller and configure all the required library you will need to use to configure your PCS connector later on.

Navigate to default_config.yml and add slurm to the scheduler.scheduler_engine section:

  scheduler:
    # Scheduler(s) to install/configure on the SOCA_CONTROLLER host
    scheduler_engine: 
      - "openpbs" # Production ready - Stable & Tested
      # - "lsf" # Preview - Development, not stable, not fully tested and not suitable for production.
      - "slurm" # Preview - Development, not stable, not fully tested and not suitable for production.

Note

You can keep openpbs / lsf if you want to run a multi-scheduler setup, or you can choose to use only Slurm.

Then, navigate to system.scheduler.slurm to review the default parameters:

slurm:
    # Install path. We recommend you to not change this path
    # if you do, make sure to update relevant cluster_analytics / log_backup  paths as well
    # Note: $SOCA_CLUSTER_ID will be automatically replaced by the SOCA Cluster Name specified at install time
    install_prefix_path: "/opt/soca/$SOCA_CLUSTER_ID/schedulers/default/slurm"
    install_sysconfig_path: "/opt/soca/$SOCA_CLUSTER_ID/schedulers/default/slurm/etc"

    version: "25-05-3-1"
    url: "https://github.com/SchedMD/slurm/archive/refs/tags/slurm-25-05-3-1.tar.gz"
    sha256: "a24d9a530e8ae1071dd3865c7260945ceffd6c65eea273d0ee21c85d8926782e"

    compatibility_packages:
        # Note: SLURM is only compatible with libjwt 1.x as there is a dependency with jwt_add_header()
        libjwt:
            url: "https://github.com/benmcollins/libjwt/releases/download/v1.17.0/libjwt-1.17.0.tar.bz2"
            sha256: "b8b257da9b64ba9075fce3a3f670ae02dee7fc95ab7009a2e1ad60905e3f8d48"

You're all set, you can now continue with a regular SOCA installation. While your cluster is provisioning, we suggest reviewing the SOCA Slurm bootstrap scripts to become familiar with the automation happening behind the scenes.

Deploy an AWS PCS Cluster

For this example, I will deploy a brand new AWS PCS cluster using the same VPC as SOCA. To simplify the setup, the AWS PCS security group will be the same as the SOCA Controller.

  • RED: VPC where my SOCA is deployed
  • BLUE: A subnet in the same VPC
  • GREEN: SOCA Controller Security Group or a security group that allow TCP traffic on port 6817 between the PCS security group and the SOCA Controller security group if you are planning to use two separate security groups.

Connect SOCA with AWS PCS

We now need to configure SOCA as a client for your PCS cluster. This step is manual for now and must be executed on the SOCA Controller:

# Run these commands as root
sudo su -

# CD into the directory where the Slurm code is copied
pushd /root/soca_bootstrap_<instance_id>/slurm/

# Clean the previous build
make distclean

# Export custom path where we will install the Slurm client for PCS
export SLURM_SOCA_INSTALL_DIR="/opt/soca/schedulers/pcs"
export SLURM_SOCA_INSTALL_SYSCONFIG_DIR="/opt/soca/schedulers/conf/pcs"

# Compile SLURM
./configure --prefix=${SLURM_SOCA_INSTALL_DIR} \
          --exec-prefix=${SLURM_SOCA_INSTALL_DIR} \
          --libdir=${SLURM_SOCA_INSTALL_DIR}/lib64 \
          --bindir=${SLURM_SOCA_INSTALL_DIR}/bin  \
          --sbindir=${SLURM_SOCA_INSTALL_DIR}/sbin \
          --includedir=${SLURM_SOCA_INSTALL_DIR}/include \
          --sysconfdir=${SLURM_SOCA_INSTALL_SYSCONFIG_DIR} \
          --with-munge=/usr \
          --disable-dependency-tracking \
          --enable-pkgconfig \
          --enable-cgroupv2 \
          --with-pmix \
          --enable-pam

make -j$(nproc)
make -j$(nproc) contrib
make install -j$(nproc)
make install-contrib -j$(nproc)
mkdir -p ${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}

Go to AWS Secrets Manager to find the PCS authentication key (starts with pcs!slurm-secret)

Click “Retrieve Secret Value”, then copy the base64-encoded string, making sure to trim it and remove any extra whitespace or newline characters.

# Copy the Slurm/PCS key and decode it
echo "Efe4/yUOoiAl<REDACTED>" | base64 --decode > ${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/slurm.key

# Adjust permissions for your key
chmod 600 ${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/slurm.key

Go to your PCS console, find the IP address of your slurmctld endpoint, and write it down as you'll need it to finish the configuration.

# Configure your slurm.conf
cat << EOF > "${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/slurm.conf"
ClusterName="soca-test-pcs"
ControlMachine="203.0.171.51" ## REPLACE WITH YOUR PCS SLURM CONTROLLER
SlurmUser=root
SlurmdUser=root
AuthType=auth/slurm
StateSaveLocation=${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/var/spool/slurmctld
SlurmdSpoolDir=${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/var/spool/slurmd
NodeName="localhost"
EOF

# Create the spoolers
mkdir -p ${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/var/spool/slurmctld
mkdir -p ${SLURM_SOCA_INSTALL_SYSCONFIG_DIR}/var/spool/slurmd

Let's validate the connectivity between SOCA Controller and PCS:

telnet 203.0.171.51 6817
Trying 203.0.171.51...
Connected to 203.0.171.51.
Escape character is '^]'.

Note

Verify the security groups configuration between your PCS cluster and your SOCA controller if your telnet command failed. Make sure traffic for TCP:6817 is authorized between the two environments.

Now, it's time to register PCS on SOCA using socactl utility:

source /etc/environmnent
/opt/soca/$SOCA_CLUSTER_ID/cluster_manager/socactl schedulers set --binary-folder-paths "/opt/soca/schedulers/pcs/bin:/opt/soca/schedulers/pcs/sbin" \
  --enabled "true" \
  --scheduler-identifier "soca-test-pcs" \
  --endpoint "203.0.171.51" \
  --provider "slurm" \
  --manage-host-provisioning "false" \
  --slurm-configuration '{"install_prefix_path": "/opt/soca/schedulers/pcs", "install_sysconfig_path": "/opt/soca/schedulers/conf/pcs"}'

{
    "enabled": true,
    "provider": "slurm",
    "endpoint": "203.0.171.51",
    "binary_folder_paths": "/opt/soca/schedulers/pcs/bin:/opt/soca/schedulers/pcs/sbin",
    "soca_managed_nodes_provisioning": false,
    "identifier": "soca-test-pcs",
    "slurm_configuration": "{\"install_prefix_path\": \"/opt/soca/schedulers/pcs\", \"install_sysconfig_path\": \"/opt/soca/schedulers/conf/pcs\"}"
}
Do you want to create this new scheduler (add --force to skip this confirmation)? (yes/no) yes
Cache updated
Success: Key has been updated successfully

# Restart the web ui
/opt/soca/$SOCA_CLUSTER_ID/cluster_manager/web_interface/socawebui.sh restart

Last, but not least, start your slurmd (not slurmctld as we only need the client) process.

cd ${SLURM_SOCA_INSTALL_DIR}
./sbin/slurmd -Dvv &

Validate connectivity between SOCA and PCS cluster using lsid utility:

${SLURM_SOCA_INSTALL_DIR}/bin/lsid
Slurm 25.05.4, May 1 2025
Copyright SchedMD LLC, 2010-2017.

My cluster name is soca-test-pcs
My master name is slurmctld-primary

Additionally, if you've already created PCS queue(s), you can view them by running sinfo:

${SLURM_SOCA_INSTALL_DIR}/bin/sinfo
PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
pcs-test-queue    up   infinite      1  idle# pcs-computenode-group-1-1

Interact with your PCS cluster

Warning

Compute provisioning is managed by AWS PCS and SOCA Job Resources are not supported on PCS. You are required to configure your PCS Compute Node Groups first.

Submit a PCS job

CLI

You can submit a PCS job using the sbatch command from any SOCA host (DCV, Login Nodes or other HPC node ....).

First, create a simple Slurm job:

#!/bin/bash
#SBATCH --partition=pcs-test-queue
#SBATCH --job-name=soca-pcs
#SBATCH --output=test.out

echo "Hello from pcs-test-queue!"
Submit the job using any SOCA user:

/opt/soca/schedulers/pcs/bin/sbatch test_script.slurm
Submitted batch job 1

Web Interface

You can submit your PCS jobs using the SOCA Web Interface:

Note

Make sure to select your soca-test-pcs scheduler as job interpreter during Step2 - Design your Job Script:

HTTP API

You can your submit your PCS job using the SOCA HTTP REST API.

Note

Make sure to select your interpreter=soca-test-pcs when you submit your POST request

View PCS Jobs using SOCA Web Interface

You can view and control your PCS HPC jobs using "My Job Queue" page on the SOCA Web Interface:

Delete PCS Jobs using SOCA Web Interface

You can delete a job using CLI scancel or directly via the web interface: