Skip to main content
AI on SageMaker HyperPod
Orchestrated by EKS
Initial cluster setup
AWS Trainium
Distributed Data Parallel
Fully Sharded Data Parallel
NVIDIA Megatron LM
Ray Train
Orchestrated by SLURM
Initial cluster setup
AWS Trainium
Distributed Data Parallel
Fully Sharded Data Parallel
NVIDIA Megatron LM
Useful links
GitHub
Authors
Yangshun Tay
3
Ex-Meta Staff Engineer, Co-founder GreatFrontEnd
Sébastien Lorber
3
Docusaurus maintainer