Getting Started¶
This guide covers installation and two end-to-end walkthroughs: deploying a predictive model (sklearn + Flask) and deploying an LLM (SGLang). Both deploy to a SageMaker managed inference endpoint.
Prerequisites¶
| Tool | Version | Purpose |
|---|---|---|
| Node.js | 24.11.1+ | Runs the CLI |
| Python | 3.8+ | Model serving code |
| Docker | 20+ | Local container builds |
| AWS CLI | 2+ | AWS resource management |
You also need an AWS IAM role with permissions for ECR, SageMaker, and (optionally) CodeBuild. Verify your setup:
Installation¶
From npm (recommended)¶
Zero-install with npx¶
From source¶
git clone https://github.com/awslabs/ml-container-creator.git
cd ml-container-creator
npm install && npm link
Verify the CLI is available:
Example 1: Predictive Model (sklearn + Flask)¶
This walkthrough generates a project that serves a scikit-learn model behind Flask on a SageMaker real-time endpoint. It uses the built-in Abalone sample model so you can follow along without providing your own model file.
Generate the project¶
mkdir sklearn-demo && cd sklearn-demo
ml-container-creator sklearn-demo \
--deployment-config=http-flask \
--engine=sklearn \
--model-format=pkl \
--include-sample \
--deployment-target=managed-inference \
--instance-type=ml.m5.large \
--region=us-east-1 \
--skip-prompts
The generator creates the project directory and trains the sample model automatically.
Project structure¶
sklearn-demo/
├── Dockerfile
├── requirements.txt
├── nginx-predictors.conf
├── code/
│ ├── model_handler.py # Model loading and inference
│ ├── serve.py # Flask server
│ ├── start_server.py
│ └── flask/
│ ├── gunicorn_config.py
│ └── wsgi.py
├── sample_model/
│ ├── train_abalone.py # Training script
│ ├── test_inference.py # Local inference test
│ └── abalone_model.pkl # Trained model artifact
├── do/ # Lifecycle scripts
│ ├── config # Project configuration
│ ├── build # Build Docker image
│ ├── push # Push to ECR
│ ├── submit # Submit build to CodeBuild
│ ├── deploy # Deploy to SageMaker
│ ├── validate # Validate config against AWS service models
│ ├── test # Test local or deployed endpoint
│ ├── clean # Tear down resources
│ ├── register # Log to deployment registry
│ ├── manifest # Asset manifest operations
│ ├── run # Run container locally
│ ├── logs # Tail CloudWatch logs
│ └── export # Export config as JSON
└── test/
├── test_local_image.sh
├── test_model_handler.py
└── test_endpoint.sh
Build, push, and deploy¶
If you haven't already, run ml-container-creator bootstrap to set up your IAM role and ECR repository. Optionally sync AWS service models for pre-deploy validation:
./do/build # Build the Docker image
./do/push # Push to Amazon ECR
./do/validate # Validate config against AWS service models (optional)
./do/deploy # Deploy to SageMaker (requires IAM role ARN)
./do/deploy creates a SageMaker endpoint configuration, endpoint, and inference component. It waits for the endpoint to reach InService status.
Test¶
Output:
🧪 Testing SageMaker endpoint: sklearn-demo-endpoint-<TIMESTAMP>
🔍 Test 1: Health check
Checking endpoint status...
✅ Endpoint is InService
🔍 Test 2: Inference request
Payload: Sample feature vector
Invoking SageMaker endpoint...
✅ Inference request successful
Response preview: {"predictions": [12.86]}
✅ All tests passed!
You can also test locally before deploying:
./do/run # Start container on localhost:8080
./do/test # Test against local container (no argument = local mode)
Bring your own model¶
To use your own model instead of the sample, edit the Dockerfile COPY directive:
# Replace the sample model line:
# COPY sample_model/abalone_model.pkl /opt/ml/model/
COPY path/to/your/model.pkl /opt/ml/model/
Example 2: LLM (SGLang)¶
This walkthrough deploys an LLM to a SageMaker endpoint using SGLang. LLM containers are large and GPU-dependent — this example uses CodeBuild for the image build.
Generate the project¶
mkdir sglang-demo && cd sglang-demo
ml-container-creator sglang-demo \
--deployment-config=transformers-sglang \
--model-name=openai/gpt-oss-20b \
--deployment-target=managed-inference \
--build-target=codebuild \
--instance-type=ml.g6.12xlarge \
--region=us-east-1 \
--skip-prompts
Project structure¶
sglang-demo/
├── Dockerfile
├── buildspec.yml # CodeBuild build specification
├── IAM_PERMISSIONS.md
├── code/
│ ├── serve # Entrypoint script launching SGLang
│ └── serving.properties # Server configuration (model ID, port)
├── do/
│ ├── config
│ ├── build
│ ├── push
│ ├── submit # Submit build to CodeBuild
│ ├── deploy
│ ├── test
│ ├── clean
│ ├── register
│ ├── manifest
│ ├── logs
│ └── export
└── test/
└── test_endpoint.sh
Build with CodeBuild¶
This creates a CodeBuild project, uploads the source, builds the Docker image, and pushes it to ECR. Monitor progress in the terminal or the CodeBuild console link printed during execution.
Deploy and test¶
./do/deploy # Deploy to SageMaker (GPU endpoint, may take 5-10 minutes)
./do/test # Test with an OpenAI-compatible chat completion request
Output:
🧪 Testing SageMaker endpoint: sglang-demo-endpoint-<TIMESTAMP>
🔍 Test 1: Health check
Checking endpoint status...
✅ Endpoint is InService
🔍 Test 2: Inference request
Payload: OpenAI-compatible chat completion request
Invoking SageMaker endpoint...
✅ Inference request successful
Response preview: {"choices": [{"message": {"content": "I'm doing great—thanks for asking!..."}}]}
✅ All tests passed!
Cleanup¶
Tear down deployed resources to stop incurring charges:
./do/clean endpoint # Delete SageMaker endpoint, config, and inference component
./do/clean ecr # Delete ECR images
./do/clean codebuild # Delete CodeBuild project and IAM role (if applicable)
./do/clean all # All of the above
CI Integration (Optional)¶
The bootstrap command can optionally provision a CI Integration Harness that automatically tests your deployment configurations end-to-end on a recurring schedule. This is useful for validating that the generator continues to produce working containers across all supported configurations.
To enable CI during bootstrap:
Or add CI to an existing bootstrap:
Once provisioned, register any generated project for automated testing:
The harness will regenerate, build, deploy, test, and tear down the project hourly, reporting results via ./do/ci report.
For full details, see the CI Integration Guide.
Next Steps¶
- How It Works — Understand the generator architecture and prompt flow
- Configuration — CLI flags, environment variables, config files, and MCP servers
- Deployment & Inference — All deployment targets and lifecycle scripts
- Examples — Walkthroughs for other architectures (Triton, diffusors, async, batch transform)
- CI Integration — Automated lifecycle testing for all deployment configurations
- Troubleshooting — Common issues and solutions