Containers
What are containers?
Containers are a great way to package software, they wrap the runtime of the software up with the application's code. This allows you to pull down optimized software containers and run them out of the box without all the complications of compiling them for a new system. In this blog we'll focus on the nvidia container repository (ngc) since they have optimized containers for applications like pytorch, nemo, and BERT.
Containers in Slurm
What are the issues with using docker containers in traditional Slurm clusters?
Containers typically require a privileged runtime, i.e. the person invoking the container needs sudo access. This is a problem for Slurm clusters which are typically multi-user environments with POSIX file permissions used to give access to certain files and directories based on users and groups.
To solve this Nvidia published enroot - this uses the linux kernel feature chroot(1) to create an isolated runtime environment for the container. Think of this like creating a mount point /tmp/container
in which the container can only see it's local directory i.e. container/
. This serves to separate the outside OS from the container's runtime. Here's an example of enroot in action:
# Import and start an Amazon Linux image from DockerHub
enroot import docker://amazonlinux:latest
enroot create amazonlinux+latest.sqsh
enroot start amazonlinux+latest
In the above example we imported a container from dockerhub, converted it to an enroot container with enroot create
and then ran it with enroot start
.
So how do you schedule and run containers with Slurm?
Slurm provides a container capability for OCI containers that's half baked. It requires users to pull down their container images, convert them to an OCI runtime, then point Slurm at that OCI image. To solve this, Nvidia introduced Pyxis which is a plugin for Slurm that allows you to run containers using the native OCI runtime capabilities and only specifying the container uri, i.e. amazonlinux/latest
. An example of this is like so:
#!/bin/bash
#SBATCH --container-image nvcr.io\#nvidia/pytorch:21.12-py3
python -c 'import torch ; print(torch.__version__)'
Pretty cool right?
Containers on SageMaker HyperPod
So how do we set this all up with SageMaker HyperPod?
-
We've already setup docker, pyxis and enroot as part of the lifecycle scripts. You should have both docker and enroot in your path:
docker --help
enroot --helpinfoIf you see the following error
ERROR: permission denied while trying to connect to the Docker daemon socket at...
when trying to rundocker
, you'll need to add the user to thedocker
group by running:sudo usermod -aG docker ${USER}
Then log out with
exit
and log back in. -
We can test that pyxis and enroot installed correctly by running the cuda:11.6.2 ubuntu image:
srun --container-image=nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
infoIf you see the following error
[WARN] Kernel module nvidia_uvm is not loaded. Make sure the NVIDIA device driver is installed and loaded.
make sure you're on Nvidia GPU cluster. this won't work withtrn1.32xlarge
instances.[ec2-user@ip-172-31-28-27 ~]$ srun --container-image=nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
pyxis: importing docker image: nvidia/cuda:11.6.2-base-ubuntu20.04
pyxis: imported docker image: nvidia/cuda:11.6.2-base-ubuntu20.04
Tue May 16 22:07:21 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 40C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ -
This can also be run in a sbatch script like so. Think of this as the equivalent of
docker run nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
on the compute node.#!/bin/bash
#SBATCH --container-image=nvidia/cuda:11.6.2-base-ubuntu20.04
nvidia-smi
It is possible that you get an error that looks like srun: unrecognized option '--container-image'
. To fix this, you need to run:
NUM_NODES= <Replace with the number of compute nodes>
srun -N $NUM_NODES sudo scontrol reconfigure