ML Container Creator¶
The BYOC toolkit for Amazon SageMaker AI
From model selection to endpoint in one workflow. Deploy, tune, and iterate — all from one CLI.
Deploy a Model in 60 Seconds¶
# Install (or use npx @aws/ml-container-creator)
npm install -g @aws/ml-container-creator
# Generate a project
ml-container-creator my-llm \
--deployment-config=transformers-vllm \
--model-name=Qwen/Qwen3-4B \
--instance-type=ml.g5.xlarge \
--enable-lora \
--skip-prompts
# Build, deploy, and test
cd my-llm
./do/build && ./do/push && ./do/deploy && ./do/test
Then iterate — fine-tune and hot-swap a LoRA adapter without restarting:
./do/tune --technique sft --dataset s3://my-bucket/train.jsonl
./do/adapter add my-sft --from-tune
./do/test # Verify the adapter works
See Getting Started for the full walkthrough with prerequisites.
Why MCC?¶
Teams spend 2–5 days writing Dockerfiles, serve scripts, and deploy scripts before they can even test inference. Then they repeat that work for every model, every framework, every deployment target.
MCC eliminates the boilerplate. You select a model and a framework — MCC generates a complete, deployable project with lifecycle scripts that cover the entire iteration loop.
You own every line of generated code. No runtime dependency. No lock-in. MCC is a code generator, not a framework.
What You Get¶
Every generated project includes 20+ do/ lifecycle scripts:
| Stage | Scripts | What They Do |
|---|---|---|
| Build | build, push, submit |
Container image → Amazon ECR |
| Deploy | deploy, add-ic, status |
Model → SageMaker AI endpoint |
| Test | test, validate, benchmark |
Inference validation + performance |
| Iterate | tune, adapter, train |
Fine-tune + hot-swap adapters |
| Operate | logs, clean, register, ci, export |
Monitoring + teardown + CI |
Every project includes 20+ scripts total. See Deployment & Inference for the full script reference.
Supported Configurations¶
Serving Architectures¶
| Architecture | Backends | Use Case |
|---|---|---|
| Transformers | vLLM, SGLang, TensorRT-LLM, LMI, DJL | Large language models |
| HTTP | Flask, FastAPI | Predictive models (sklearn, XGBoost, TensorFlow) |
| Triton | FIL, ONNX Runtime, TensorFlow, PyTorch, vLLM, TensorRT-LLM, Python | Multi-framework serving |
| Diffusors | vLLM | Image generation |
Deployment Targets¶
| Target | Description |
|---|---|
| Managed Inference | SageMaker AI real-time endpoints |
| Async Inference | S3-based async processing with SNS notifications |
| Batch Transform | S3-to-S3 dataset processing |
| HyperPod EKS | Kubernetes on SageMaker AI HyperPod clusters |
Validated Models¶
MCC validates 22+ model + instance combinations end-to-end — from generation through fine-tuning and adapter serving. If your configuration is in the Supported Models catalog, every lifecycle step has been proven.
Models NOT in the catalog still work — MCC generates projects for any HuggingFace model. You take on validation yourself.
Intelligent Defaults (MCP Servers)¶
Six bundled Model Context Protocol servers recommend configurations automatically:
| Server | What It Does |
|---|---|
| instance-sizer | Recommends instance types based on model size + framework |
| region-picker | Finds regions with availability |
| base-image-picker | Selects optimal base image for CUDA version |
| model-picker | Discovers models from HuggingFace, S3, Marketplace |
| hyperpod-cluster-picker | Lists available HyperPod EKS clusters |
| endpoint-picker | Discovers existing endpoints for attachment |
MCP is optional — MCC works without it. MCP just makes the defaults smarter.
Documentation¶
| Section | For... |
|---|---|
| Getting Started | First-time users — install, bootstrap, deploy your first model |
| Supported Models | Check if your model is validated end-to-end |
| Configuration | CLI flags, env vars, config files, MCP, precedence |
| Deployment & Inference | All deployment targets + full lifecycle scripts reference |
| Fine-Tuning | Managed tuning, LoRA adapters, the iterate loop |
| MCP Servers | Configure and extend the intelligent defaults |
| CI Integration | Automated E2E validation + regression detection |
| Benchmarking | Performance measurement with SageMaker AI Benchmarking |
| Examples | Copy-paste walkthroughs for every architecture |
| Troubleshooting | Common issues and solutions |
| Contributing | Development setup + contribution workflow |
| Command Generator | Interactive tool to build deployment commands |
Links¶
License¶
Apache-2.0. See CONTRIBUTING for security issue reporting.