Configuration¶
ML Container Creator supports multiple configuration methods with a clear precedence order, from interactive prompts to fully automated CLI usage.
Precedence¶
Configuration sources are applied in strict precedence order (highest to lowest):
| Priority | Source | Description | Example |
|---|---|---|---|
| 1 | CLI Options | Command-line flags | --deployment-config=http-flask |
| 2 | CLI Arguments | Positional arguments | ml-container-creator my-project |
| 3 | Environment Variables | Shell environment | export AWS_REGION=us-east-1 |
| 4 | CLI Config File | --config specified file |
--config=production.json |
| 5 | Custom Config File | config/mcp.json |
Auto-discovered in current directory |
| 6 | Package.json Section | "ml-container-creator": {...} |
Project-specific defaults |
| 7 | Generator Defaults | Built-in defaults | awsRegion: "us-east-1" |
| 8 | Interactive Prompts | User input (fallback) | CLI prompts |
Higher precedence sources override lower ones.
Parameter Reference¶
This table shows every parameter, its CLI flag, which configuration sources support it, and whether it is required.
| Parameter | CLI Option | Env Var | Config File | Package.json | Default | Required |
|---|---|---|---|---|---|---|
| Core | ||||||
| Deployment Config | --deployment-config |
-- | yes | -- | -- | yes |
| Engine | --engine |
-- | yes | -- | -- | no |
| Model Server | --model-server |
-- | yes | -- | -- | no |
| Model Format | --model-format |
-- | yes | -- | -- | yes |
| Modules | ||||||
| Include Sample | --include-sample |
-- | yes | -- | false |
yes |
| Include Testing | --include-testing |
-- | yes | -- | true |
yes |
| Infrastructure | ||||||
| Deployment Target | --deployment-target |
-- | yes | -- | -- | yes |
| Build Target | --build-target |
-- | yes | -- | -- | yes |
| Instance Type | --instance-type |
ML_INSTANCE_TYPE |
yes | -- | -- | yes |
| CodeBuild Compute | --codebuild-compute-type |
-- | yes | -- | BUILD_GENERAL1_MEDIUM |
no |
| AWS Region | --region |
AWS_REGION |
yes | yes | us-east-1 |
no |
| AWS Role ARN | --role-arn |
AWS_ROLE |
yes | yes | -- | no |
| HyperPod EKS | ||||||
| Cluster | --hyperpod-cluster |
-- | yes | -- | -- | no |
| Namespace | --hyperpod-namespace |
-- | yes | -- | -- | no |
| Replicas | --hyperpod-replicas |
-- | yes | -- | 1 |
no |
| FSx Volume Handle | --fsx-volume-handle |
-- | yes | -- | -- | no |
| Project | ||||||
| Project Name | --project-name |
-- | yes | yes | -- | yes |
| Project Directory | --project-dir |
-- | yes | yes | . |
yes |
| System | ||||||
| Config File | --config |
ML_CONTAINER_CREATOR_CONFIG |
-- | yes | -- | no |
| Skip Prompts | --skip-prompts |
-- | -- | -- | false |
no |
Core parameters (deployment-config, engine, model-server, model-format) are not supported via environment variables or package.json. Only infrastructure and project settings are supported in those sources.
Deployment Configs¶
The --deployment-config flag bundles the architecture and model server into a single value:
| Config | Architecture | Backend | Use Case |
|---|---|---|---|
http-flask |
HTTP | Flask | Traditional ML with Flask server |
http-fastapi |
HTTP | FastAPI | Traditional ML with FastAPI server |
transformers-vllm |
Transformers | vLLM | LLM serving with vLLM |
transformers-sglang |
Transformers | SGLang | LLM serving with SGLang |
transformers-tensorrt-llm |
Transformers | TensorRT-LLM | LLM serving with TensorRT-LLM |
transformers-lmi |
Transformers | LMI | LLM serving with Large Model Inference |
transformers-djl |
Transformers | DJL | LLM serving with Deep Java Library |
triton-fil |
Triton | FIL | Tree models (XGBoost, LightGBM) on Triton |
triton-onnxruntime |
Triton | ONNX Runtime | ONNX models on Triton |
triton-tensorflow |
Triton | TensorFlow | TensorFlow models on Triton |
triton-pytorch |
Triton | PyTorch | PyTorch models on Triton |
triton-vllm |
Triton | vLLM | LLM serving on Triton |
triton-tensorrtllm |
Triton | TensorRT-LLM | LLM serving on Triton with TensorRT-LLM |
triton-python |
Triton | Python | Custom Python models on Triton |
marketplace |
Marketplace | -- | AWS Marketplace model packages (no container build) |
For traditional ML configs (http-flask, http-fastapi), also specify --engine to set the ML engine (sklearn, xgboost, tensorflow).
The marketplace config deploys pre-built vendor model packages from AWS Marketplace. No Dockerfile, no build/push — just deploy, test, and benchmark. Use the marketplace:// prefix with --model-name:
ml-container-creator my-marketplace-model \
--deployment-config=marketplace \
--model-name='marketplace://arn:aws:sagemaker:us-east-1:aws:model-package/vendor-model/1' \
--instance-type=ml.g5.xlarge \
--region=us-east-1
Model Formats¶
| Engine | Supported Formats | Default |
|---|---|---|
| sklearn | pkl, joblib |
pkl |
| xgboost | json, model, ubj |
json |
| tensorflow | keras, h5, SavedModel |
keras |
| transformers | N/A (models loaded from HuggingFace Hub) | -- |
Configuration Methods¶
Interactive Mode¶
The default. Run the generator and answer the prompts:
CLI Options¶
Use command-line flags for non-interactive generation:
ml-container-creator my-project \
--deployment-config=http-flask \
--engine=sklearn \
--model-format=pkl \
--deployment-target=managed-inference \
--instance-type=ml.m5.large \
--build-target=codebuild \
--skip-prompts
The project name can also be passed as a positional argument (priority 2 in the precedence chain).
Environment Variables¶
Set infrastructure parameters via the shell environment:
export ML_INSTANCE_TYPE="ml.g5.2xlarge"
export AWS_REGION="us-west-2"
export AWS_ROLE="arn:aws:iam::123456789012:role/SageMaker AIRole"
ml-container-creator --deployment-config=transformers-vllm --skip-prompts
Only four environment variables are supported: ML_INSTANCE_TYPE, AWS_REGION, AWS_ROLE, and ML_CONTAINER_CREATOR_CONFIG. Core parameters must come from CLI options or config files.
Configuration Files¶
Three file-based sources are supported, in descending precedence:
CLI config file (--config flag or ML_CONTAINER_CREATOR_CONFIG env var):
Custom config file (config/mcp.json, auto-discovered):
{
"projectName": "my-ml-project",
"deploymentConfig": "http-flask",
"engine": "sklearn",
"modelFormat": "pkl",
"includeSampleModel": false,
"includeTesting": true,
"deploymentTarget": "managed-inference",
"buildTarget": "codebuild",
"instanceType": "ml.m5.large",
"awsRegion": "us-east-1",
"awsRoleArn": "arn:aws:iam::123456789012:role/SageMaker AIRole"
}
Package.json section (infrastructure and project settings only):
{
"name": "my-project",
"ml-container-creator": {
"awsRegion": "us-west-2",
"awsRoleArn": "arn:aws:iam::123456789012:role/MyProjectRole",
"projectName": "my-ml-service"
}
}
CLI Commands¶
Beyond project generation, MCC provides configuration management commands:
| Command | Description |
|---|---|
ml-container-creator configure |
Interactive configuration file setup |
ml-container-creator generate-empty-config |
Create an empty config file template |
ml-container-creator help |
Show all options and examples |
HuggingFace Authentication¶
When deploying transformer models, you may need to authenticate with HuggingFace to access private or gated models. Public models like openai/gpt-oss-20b do not require authentication.
Authentication is required for:
- Private models in your HuggingFace account
- Gated models requiring license agreement (e.g., Llama 2, Llama 3)
- Avoiding rate limits on public models
Providing Your Token¶
CLI option:
ml-container-creator my-llm-project \
--deployment-config=transformers-vllm \
--model-name=meta-llama/Llama-2-7b-hf \
--hf-token='$HF_TOKEN' \
--skip-prompts
Config file:
{
"deploymentConfig": "transformers-vllm",
"modelName": "meta-llama/Llama-2-7b-hf",
"hfToken": "$HF_TOKEN"
}
Interactive prompt: When you enter a custom model ID during generation, you will be prompted for a token. You can enter the token directly, reference $HF_TOKEN, or leave it empty for public models.
Secrets Manager Alternative¶
For improved security, use AWS Secrets Manager instead of plaintext tokens. Pass an ARN instead of a literal value:
ml-container-creator my-project \
--deployment-config=transformers-vllm \
--model-name=meta-llama/Llama-3-8B \
--hf-token-arn=arn:aws:secretsmanager:us-east-1:123456789012:secret:mlcc/hf-token/production-AbCdEf \
--skip-prompts
This resolves the token at build-time and runtime without baking it into the image. See Secrets Management for the full workflow, including creating and managing secrets.
Note
You cannot use both --hf-token and --hf-token-arn simultaneously. Choose one approach per project.
Security¶
Tokens are baked into the Docker image. Anyone with access to the image can extract the token via docker inspect.
- Use
$HF_TOKEN(environment variable reference) in config files and CI/CD pipelines instead of literal tokens. - Never commit tokens to version control.
- Use read-only tokens with minimal permissions.
- Rotate tokens periodically. Generate new ones at huggingface.co/settings/tokens.
- Consider using Secrets Manager for zero-knowledge images and automatic rotation.
Troubleshooting Authentication¶
| Symptom | Cause | Fix |
|---|---|---|
| "Repository not found" or "Access denied" | Invalid token, expired token, or license not accepted | Verify token at huggingface.co; accept model license |
| "HF_TOKEN environment variable not set" | $HF_TOKEN referenced but not exported |
export HF_TOKEN=hf_... |
| Container builds but fails at runtime | Model requires auth but no token provided | Rebuild with --hf-token |
Validation¶
The generator validates configuration at multiple levels:
Parameter Validation¶
The generator validates configuration parameters and provides error messages:
# Invalid deployment config
ml-container-creator --deployment-config=invalid --skip-prompts
# Error: invalid not implemented yet.
# Incompatible model format
ml-container-creator --deployment-config=http-flask --engine=sklearn --model-format=json --skip-prompts
# Error: Unsupported model format 'json' for engine 'sklearn'
# Invalid ARN
ml-container-creator --role-arn=invalid-arn --skip-prompts
# Error: Invalid AWS Role ARN format
# Missing required parameter
ml-container-creator --skip-prompts
# Error: Required parameter 'deploymentConfig' is missing
Do not mix incompatible options: traditional ML engines with LLM deployment configs, model formats with transformer configs, or sample models with transformer configs will all produce validation errors.
Schema-Driven Validation¶
Schema-driven validation checks generated AWS API payloads against the actual AWS service model (service-2.json) files. It catches issues that parameter validation cannot — enum values that AWS has deprecated, type mismatches in nested structures, missing required fields for specific API operations, and cross-cutting consistency problems between your Dockerfile, deploy scripts, and configuration.
Setup¶
Download the AWS service models into your local schema registry:
This downloads service models for SageMaker AI, IAM, ECR, and S3 from the AWS SDK source and stores them at ~/.ml-container-creator/schemas/. Re-run periodically to pick up new enum values and API changes.
When Validation Runs¶
Schema validation runs at two points:
At generation time (non-blocking): After the generator produces deploy scripts, it validates the constructed payloads and prints any issues as warnings. Generation still completes — this is informational.
At pre-deploy time (blocking): Run ./do/validate before deploying to catch all issues, including those introduced by manual edits to do/config after generation.
# Run full schema validation
./do/validate
# JSON output for CI integration
./do/validate --format=json
# Include smart-mode validators (future MCP integration)
./do/validate --smart
What It Checks¶
| Check | Example Issue Caught |
|---|---|
| Enum values | InferenceAmiVersion set to a value AWS no longer accepts |
| Type mismatches | InitialInstanceCount set to a string instead of integer |
| Required fields | EndpointConfigName missing from CreateEndpointConfig payload |
| Pattern constraints | Role ARN not matching arn:aws:iam::\d{12}:role/.+ |
| Range constraints | VolumeSizeInGB below minimum or above maximum |
| GPU consistency | NumberOfAcceleratorDevicesRequired doesn't match instance GPU count |
| Tensor parallelism | VLLM_TENSOR_PARALLEL_SIZE != IC GPU count != instance GPUs |
| CUDA compatibility | Base image requires CUDA 12 but instance only supports CUDA 11 |
| Model source requirements | jumpstart-hub source without HubAccessConfig.HubContentArn |
Exit Codes¶
| Code | Meaning |
|---|---|
| 0 | Validation passed (no errors, may have warnings) |
| 1 | Validation failed (one or more errors found) |
| 2 | Validation could not run (schema registry missing) |
Keeping Schemas Current¶
The schema registry becomes stale as AWS adds new enum values and instance types. If schemas are older than 30 days, validation prints a warning:
Suppress this warning with --ignore-staleness if you're working offline.
Catalog Validation¶
Validate that catalog entries use valid AWS enum values:
This checks fields like inferenceAmiVersion in model-servers.json against the SageMaker AI service model's enum set. Run this as a CI gate when updating catalog files.
Skipping Validation¶
Pass --no-validate to the generator to skip schema validation at generation time:
Architecture Compatibility¶
Architecture compatibility validation checks whether your chosen model's model_type (from its HuggingFace config.json) is supported by the selected server version. This catches mismatches early — before you spend time building and deploying a container that won't load the model.
Syncing Architecture Data¶
Populate the architecture registry by scraping model type lists from server GitHub repositories (vLLM, SGLang, TensorRT-LLM):
This fetches each server version's model registry source file, parses it for supported model_type values, and writes them into the supportedModelTypes field in model-servers.json.
Note
bootstrap automatically runs sync-architectures as part of its post-setup chain. You only need to run it manually to pick up newly released server versions.
Viewing Supported Architectures¶
List supported architecture counts per server version:
Output:
Model Architecture Support:
Server Version Architectures
──────────────────── ─────────── ─────────────
vllm 0.6.3 85
vllm 0.5.5 72
sglang 0.4.1 68
tensorrt-llm 0.15.0 45
Filter by server or show the full model type list:
ml-container-creator registry list-architectures --server vllm
ml-container-creator registry list-architectures --verbose
Pre-Generation Compatibility Check¶
Check a specific model's compatibility before generating a project:
Output:
🔍 Checking model: meta-llama/Llama-3-8B
Fetching model config from HuggingFace...
Model type: llama
✅ Compatible server versions:
• vllm 0.6.3
• vllm 0.5.5
• sglang 0.4.1
• tensorrt-llm 0.15.0
⚠️ Potentially incompatible server versions:
• tensorrt-llm 0.12.0
This fetches the model's config.json from HuggingFace, extracts the model_type, and checks it against all server versions in the catalog.
Generation-Time Warning¶
When you generate a project, the architecture check runs automatically. If the model type is not in the server's supported list, you'll see an advisory warning:
This is advisory only — generation still completes. Some models work via trust_remote_code even when not in the official registry.
do/validate Architecture Findings¶
The do/validate script includes architecture compatibility as one of its cross-cutting checks. If the model type doesn't match the server's supported list, it reports a medium-confidence warning alongside other validation findings.