Skip to main content
AI on SageMaker HyperPod
Orchestrated by EKS
Initial cluster setup
AWS Trainium
Distributed Data Parallel
Fully Sharded Data Parallel
NVIDIA Megatron LM
Ray Train
Orchestrated by SLURM
Initial cluster setup
AWS Trainium
Distributed Data Parallel
Fully Sharded Data Parallel
NVIDIA Megatron LM
Useful links
GitHub
EKS Blueprints
Training
Training
Training examples for different models and frameworks
🗃️ DDP
1 item
🗃️ FSDP
1 item
🗃️ Megatron-LM
1 item
🗃️ Ray Train
1 item
🗃️ Trainium
1 item