Skip to content

Configuration

ML Container Creator supports multiple configuration methods with a clear precedence order, from interactive prompts to fully automated CLI usage.

Precedence

Configuration sources are applied in strict precedence order (highest to lowest):

Priority Source Description Example
1 CLI Options Command-line flags --deployment-config=http-flask
2 CLI Arguments Positional arguments ml-container-creator my-project
3 Environment Variables Shell environment export AWS_REGION=us-east-1
4 CLI Config File --config specified file --config=production.json
5 Custom Config File config/mcp.json Auto-discovered in current directory
6 Package.json Section "ml-container-creator": {...} Project-specific defaults
7 Generator Defaults Built-in defaults awsRegion: "us-east-1"
8 Interactive Prompts User input (fallback) CLI prompts

Higher precedence sources override lower ones.

Parameter Reference

This table shows every parameter, its CLI flag, which configuration sources support it, and whether it is required.

Parameter CLI Option Env Var Config File Package.json Default Required
Core
Deployment Config --deployment-config -- yes -- -- yes
Engine --engine -- yes -- -- no
Model Server --model-server -- yes -- -- no
Model Format --model-format -- yes -- -- yes
Modules
Include Sample --include-sample -- yes -- false yes
Include Testing --include-testing -- yes -- true yes
Infrastructure
Deployment Target --deployment-target -- yes -- -- yes
Build Target --build-target -- yes -- -- yes
Instance Type --instance-type ML_INSTANCE_TYPE yes -- -- yes
CodeBuild Compute --codebuild-compute-type -- yes -- BUILD_GENERAL1_MEDIUM no
AWS Region --region AWS_REGION yes yes us-east-1 no
AWS Role ARN --role-arn AWS_ROLE yes yes -- no
HyperPod EKS
Cluster --hyperpod-cluster -- yes -- -- no
Namespace --hyperpod-namespace -- yes -- -- no
Replicas --hyperpod-replicas -- yes -- 1 no
FSx Volume Handle --fsx-volume-handle -- yes -- -- no
Project
Project Name --project-name -- yes yes -- yes
Project Directory --project-dir -- yes yes . yes
System
Config File --config ML_CONTAINER_CREATOR_CONFIG -- yes -- no
Skip Prompts --skip-prompts -- -- -- false no

Core parameters (deployment-config, engine, model-server, model-format) are not supported via environment variables or package.json. Only infrastructure and project settings are supported in those sources.

Deployment Configs

The --deployment-config flag bundles the architecture and model server into a single value:

Config Architecture Backend Use Case
http-flask HTTP Flask Traditional ML with Flask server
http-fastapi HTTP FastAPI Traditional ML with FastAPI server
transformers-vllm Transformers vLLM LLM serving with vLLM
transformers-sglang Transformers SGLang LLM serving with SGLang
transformers-tensorrt-llm Transformers TensorRT-LLM LLM serving with TensorRT-LLM
transformers-lmi Transformers LMI LLM serving with Large Model Inference
transformers-djl Transformers DJL LLM serving with Deep Java Library
triton-fil Triton FIL Tree models (XGBoost, LightGBM) on Triton
triton-onnxruntime Triton ONNX Runtime ONNX models on Triton
triton-tensorflow Triton TensorFlow TensorFlow models on Triton
triton-pytorch Triton PyTorch PyTorch models on Triton
triton-vllm Triton vLLM LLM serving on Triton
triton-tensorrtllm Triton TensorRT-LLM LLM serving on Triton with TensorRT-LLM
triton-python Triton Python Custom Python models on Triton
marketplace Marketplace -- AWS Marketplace model packages (no container build)

For traditional ML configs (http-flask, http-fastapi), also specify --engine to set the ML engine (sklearn, xgboost, tensorflow).

The marketplace config deploys pre-built vendor model packages from AWS Marketplace. No Dockerfile, no build/push — just deploy, test, and benchmark. Use the marketplace:// prefix with --model-name:

ml-container-creator my-marketplace-model \
  --deployment-config=marketplace \
  --model-name='marketplace://arn:aws:sagemaker:us-east-1:aws:model-package/vendor-model/1' \
  --instance-type=ml.g5.xlarge \
  --region=us-east-1

Model Formats

Engine Supported Formats Default
sklearn pkl, joblib pkl
xgboost json, model, ubj json
tensorflow keras, h5, SavedModel keras
transformers N/A (models loaded from HuggingFace Hub) --

Configuration Methods

Interactive Mode

The default. Run the generator and answer the prompts:

ml-container-creator

CLI Options

Use command-line flags for non-interactive generation:

ml-container-creator my-project \
  --deployment-config=http-flask \
  --engine=sklearn \
  --model-format=pkl \
  --deployment-target=managed-inference \
  --instance-type=ml.m5.large \
  --build-target=codebuild \
  --skip-prompts

The project name can also be passed as a positional argument (priority 2 in the precedence chain).

Environment Variables

Set infrastructure parameters via the shell environment:

export ML_INSTANCE_TYPE="ml.g5.2xlarge"
export AWS_REGION="us-west-2"
export AWS_ROLE="arn:aws:iam::123456789012:role/SageMaker AIRole"

ml-container-creator --deployment-config=transformers-vllm --skip-prompts

Only four environment variables are supported: ML_INSTANCE_TYPE, AWS_REGION, AWS_ROLE, and ML_CONTAINER_CREATOR_CONFIG. Core parameters must come from CLI options or config files.

Configuration Files

Three file-based sources are supported, in descending precedence:

CLI config file (--config flag or ML_CONTAINER_CREATOR_CONFIG env var):

ml-container-creator --config=production.json --skip-prompts

Custom config file (config/mcp.json, auto-discovered):

{
  "projectName": "my-ml-project",
  "deploymentConfig": "http-flask",
  "engine": "sklearn",
  "modelFormat": "pkl",
  "includeSampleModel": false,
  "includeTesting": true,
  "deploymentTarget": "managed-inference",
  "buildTarget": "codebuild",
  "instanceType": "ml.m5.large",
  "awsRegion": "us-east-1",
  "awsRoleArn": "arn:aws:iam::123456789012:role/SageMaker AIRole"
}

Package.json section (infrastructure and project settings only):

{
  "name": "my-project",
  "ml-container-creator": {
    "awsRegion": "us-west-2",
    "awsRoleArn": "arn:aws:iam::123456789012:role/MyProjectRole",
    "projectName": "my-ml-service"
  }
}

CLI Commands

Beyond project generation, MCC provides configuration management commands:

Command Description
ml-container-creator configure Interactive configuration file setup
ml-container-creator generate-empty-config Create an empty config file template
ml-container-creator help Show all options and examples

HuggingFace Authentication

When deploying transformer models, you may need to authenticate with HuggingFace to access private or gated models. Public models like openai/gpt-oss-20b do not require authentication.

Authentication is required for:

  • Private models in your HuggingFace account
  • Gated models requiring license agreement (e.g., Llama 2, Llama 3)
  • Avoiding rate limits on public models

Providing Your Token

CLI option:

ml-container-creator my-llm-project \
  --deployment-config=transformers-vllm \
  --model-name=meta-llama/Llama-2-7b-hf \
  --hf-token='$HF_TOKEN' \
  --skip-prompts

Config file:

{
  "deploymentConfig": "transformers-vllm",
  "modelName": "meta-llama/Llama-2-7b-hf",
  "hfToken": "$HF_TOKEN"
}

Interactive prompt: When you enter a custom model ID during generation, you will be prompted for a token. You can enter the token directly, reference $HF_TOKEN, or leave it empty for public models.

Secrets Manager Alternative

For improved security, use AWS Secrets Manager instead of plaintext tokens. Pass an ARN instead of a literal value:

ml-container-creator my-project \
  --deployment-config=transformers-vllm \
  --model-name=meta-llama/Llama-3-8B \
  --hf-token-arn=arn:aws:secretsmanager:us-east-1:123456789012:secret:mlcc/hf-token/production-AbCdEf \
  --skip-prompts

This resolves the token at build-time and runtime without baking it into the image. See Secrets Management for the full workflow, including creating and managing secrets.

Note

You cannot use both --hf-token and --hf-token-arn simultaneously. Choose one approach per project.

Security

Tokens are baked into the Docker image. Anyone with access to the image can extract the token via docker inspect.

  • Use $HF_TOKEN (environment variable reference) in config files and CI/CD pipelines instead of literal tokens.
  • Never commit tokens to version control.
  • Use read-only tokens with minimal permissions.
  • Rotate tokens periodically. Generate new ones at huggingface.co/settings/tokens.
  • Consider using Secrets Manager for zero-knowledge images and automatic rotation.

Troubleshooting Authentication

Symptom Cause Fix
"Repository not found" or "Access denied" Invalid token, expired token, or license not accepted Verify token at huggingface.co; accept model license
"HF_TOKEN environment variable not set" $HF_TOKEN referenced but not exported export HF_TOKEN=hf_...
Container builds but fails at runtime Model requires auth but no token provided Rebuild with --hf-token

Validation

The generator validates configuration at multiple levels:

Parameter Validation

The generator validates configuration parameters and provides error messages:

# Invalid deployment config
ml-container-creator --deployment-config=invalid --skip-prompts
# Error: invalid not implemented yet.

# Incompatible model format
ml-container-creator --deployment-config=http-flask --engine=sklearn --model-format=json --skip-prompts
# Error: Unsupported model format 'json' for engine 'sklearn'

# Invalid ARN
ml-container-creator --role-arn=invalid-arn --skip-prompts
# Error: Invalid AWS Role ARN format

# Missing required parameter
ml-container-creator --skip-prompts
# Error: Required parameter 'deploymentConfig' is missing

Do not mix incompatible options: traditional ML engines with LLM deployment configs, model formats with transformer configs, or sample models with transformer configs will all produce validation errors.

Schema-Driven Validation

Schema-driven validation checks generated AWS API payloads against the actual AWS service model (service-2.json) files. It catches issues that parameter validation cannot — enum values that AWS has deprecated, type mismatches in nested structures, missing required fields for specific API operations, and cross-cutting consistency problems between your Dockerfile, deploy scripts, and configuration.

Setup

Download the AWS service models into your local schema registry:

ml-container-creator bootstrap sync-schemas

This downloads service models for SageMaker AI, IAM, ECR, and S3 from the AWS SDK source and stores them at ~/.ml-container-creator/schemas/. Re-run periodically to pick up new enum values and API changes.

When Validation Runs

Schema validation runs at two points:

At generation time (non-blocking): After the generator produces deploy scripts, it validates the constructed payloads and prints any issues as warnings. Generation still completes — this is informational.

At pre-deploy time (blocking): Run ./do/validate before deploying to catch all issues, including those introduced by manual edits to do/config after generation.

# Run full schema validation
./do/validate

# JSON output for CI integration
./do/validate --format=json

# Include smart-mode validators (future MCP integration)
./do/validate --smart

What It Checks

Check Example Issue Caught
Enum values InferenceAmiVersion set to a value AWS no longer accepts
Type mismatches InitialInstanceCount set to a string instead of integer
Required fields EndpointConfigName missing from CreateEndpointConfig payload
Pattern constraints Role ARN not matching arn:aws:iam::\d{12}:role/.+
Range constraints VolumeSizeInGB below minimum or above maximum
GPU consistency NumberOfAcceleratorDevicesRequired doesn't match instance GPU count
Tensor parallelism VLLM_TENSOR_PARALLEL_SIZE != IC GPU count != instance GPUs
CUDA compatibility Base image requires CUDA 12 but instance only supports CUDA 11
Model source requirements jumpstart-hub source without HubAccessConfig.HubContentArn

Exit Codes

Code Meaning
0 Validation passed (no errors, may have warnings)
1 Validation failed (one or more errors found)
2 Validation could not run (schema registry missing)

Keeping Schemas Current

The schema registry becomes stale as AWS adds new enum values and instance types. If schemas are older than 30 days, validation prints a warning:

⚠️  Schema registry is 45 days old. Run `ml-container-creator bootstrap sync-schemas` to update.

Suppress this warning with --ignore-staleness if you're working offline.

Catalog Validation

Validate that catalog entries use valid AWS enum values:

npm run validate:catalogs

This checks fields like inferenceAmiVersion in model-servers.json against the SageMaker AI service model's enum set. Run this as a CI gate when updating catalog files.

Skipping Validation

Pass --no-validate to the generator to skip schema validation at generation time:

ml-container-creator my-project --deployment-config=transformers-vllm --no-validate --skip-prompts

Architecture Compatibility

Architecture compatibility validation checks whether your chosen model's model_type (from its HuggingFace config.json) is supported by the selected server version. This catches mismatches early — before you spend time building and deploying a container that won't load the model.

Syncing Architecture Data

Populate the architecture registry by scraping model type lists from server GitHub repositories (vLLM, SGLang, TensorRT-LLM):

ml-container-creator registry sync-architectures

This fetches each server version's model registry source file, parses it for supported model_type values, and writes them into the supportedModelTypes field in model-servers.json.

Note

bootstrap automatically runs sync-architectures as part of its post-setup chain. You only need to run it manually to pick up newly released server versions.

Viewing Supported Architectures

List supported architecture counts per server version:

ml-container-creator registry list-architectures

Output:

Model Architecture Support:

  Server                Version      Architectures
  ────────────────────  ───────────  ─────────────
  vllm                  0.6.3        85
  vllm                  0.5.5        72
  sglang                0.4.1        68
  tensorrt-llm          0.15.0       45

Filter by server or show the full model type list:

ml-container-creator registry list-architectures --server vllm
ml-container-creator registry list-architectures --verbose

Pre-Generation Compatibility Check

Check a specific model's compatibility before generating a project:

ml-container-creator registry check meta-llama/Llama-3-8B

Output:

🔍 Checking model: meta-llama/Llama-3-8B

   Fetching model config from HuggingFace...
   Model type: llama

   ✅ Compatible server versions:
      • vllm 0.6.3
      • vllm 0.5.5
      • sglang 0.4.1
      • tensorrt-llm 0.15.0

   ⚠️  Potentially incompatible server versions:
      • tensorrt-llm 0.12.0

This fetches the model's config.json from HuggingFace, extracts the model_type, and checks it against all server versions in the catalog.

Generation-Time Warning

When you generate a project, the architecture check runs automatically. If the model type is not in the server's supported list, you'll see an advisory warning:

⚠️  Model architecture "mamba" may not be supported by vllm 0.5.5. Consider a newer server version.

This is advisory only — generation still completes. Some models work via trust_remote_code even when not in the official registry.

do/validate Architecture Findings

The do/validate script includes architecture compatibility as one of its cross-cutting checks. If the model type doesn't match the server's supported list, it reports a medium-confidence warning alongside other validation findings.