Skip to main content
AI on SageMaker HyperPod
Orchestrated by EKS
Initial cluster setup
AWS Trainium
Distributed Data Parallel
Fully Sharded Data Parallel
NVIDIA Megatron LM
Ray Train
Orchestrated by SLURM
Initial cluster setup
AWS Trainium
Distributed Data Parallel
Fully Sharded Data Parallel
NVIDIA Megatron LM
Useful links
GitHub
Add-Ons
Utilities
FinOps
FinOps
Previous
SkyPilot
Next
Log Analysis