Architecture Overview¶

This page provides a high-level overview of the ML Container Creator system. For detailed implementation guidance, see the Generator Architecture developer guide.

System Components¶

graph TD
    subgraph "Generator"
        CLI["CLI Entry Point<br/>index.js"]
        CM["ConfigManager<br/>8-source precedence"]
        PR["PromptRunner<br/>5-phase prompts"]
        TM["TemplateManager<br/>validation"]
        TE["Template Engine<br/>EJS processing"]
        GV["GenerationValidator<br/>schema validation"]
    end

    subgraph "MCP Servers"
        BIP["base-image-picker"]
        IS["instance-sizer"]
        RP["region-picker"]
        MP["model-picker"]
        HCP["hyperpod-cluster-picker"]
    end

    subgraph "Catalogs"
        CAT["servers/lib/catalogs/*.json"]
    end

    subgraph "Schema Registry"
        SR["~/.ml-container-creator/schemas/"]
    end

    subgraph "Generated Project"
        DO["do/ scripts"]
        CODE["code/ serve scripts"]
        DOCKER["Dockerfile"]
    end

    CLI --> CM
    CM --> PR
    PR --> TM
    TM --> TE
    TE --> DO
    TE --> CODE
    TE --> DOCKER
    TE --> GV
    GV -->|"validates payloads"| SR
    CM -->|"queries"| BIP
    CM -->|"queries"| IS
    CM -->|"queries"| RP
    CM -->|"queries"| MP
    CM -->|"queries"| HCP
    CAT -->|"feeds"| BIP
    CAT -->|"feeds"| IS
    CAT -->|"feeds"| RP
    CAT -->|"feeds"| MP

Deployment Configurations¶

The generator supports 15 deployment configurations across 4 architecture families:

Architecture	Backends	Description
HTTP	flask, fastapi	Traditional ML models (sklearn, xgboost, tensorflow) with a Python web server
Transformers	vllm, sglang, tensorrt-llm, lmi, djl	LLM serving where the framework handles HTTP, batching, and model loading
Triton	fil, onnxruntime, tensorflow, pytorch, vllm, tensorrtllm, python	NVIDIA Triton Inference Server with model repository pattern
Diffusors	vllm-omni	Image generation models via vLLM's diffusion support

Each configuration is expressed as an architecture-backend string (e.g., transformers-vllm, triton-fil).

Deployment Targets¶

Target	Description
`managed-inference`	Real-time SageMaker endpoints
`async-inference`	Asynchronous endpoints with S3 input/output
`batch-transform`	Batch processing of datasets in S3
`hyperpod-eks`	Kubernetes deployment on SageMaker HyperPod EKS clusters

Generator Lifecycle¶

The CLI runs through four generation phases:

Initializing — Loads configuration from CLI, environment, config files, and MCP servers. Checks for subcommands (bootstrap, registry, mcp).
Prompting — Collects user input in four phases: Infrastructure, Core ML, Modules, Project. Queries MCP servers for instance types, regions, base images, and models.
Writing — Validates the configuration, processes EJS templates, and generates the project with do-framework scripts.
Post-generate — Runs sample model training if requested, sets executable permissions.

Generated Project Structure¶

Every generated project uses the do-framework for lifecycle management:

project-name/
├── Dockerfile
├── do/
│   ├── config              # All configuration variables
│   ├── build               # Build Docker image
│   ├── push                # Push to ECR
│   ├── submit              # Submit to CodeBuild
│   ├── deploy              # Deploy to SageMaker
│   ├── test                # Run inference tests
│   ├── clean               # Tear down resources
│   ├── register            # Log to deployment registry
│   ├── export              # Export config as JSON
│   ├── manifest            # Asset manifest operations
│   ├── logs                # View CloudWatch logs
│   ├── ci                  # CI harness commands
│   └── run                 # Run container locally
├── code/
│   └── serve               # Model serving entrypoint
├── model_repository/       # Triton only
└── sample_model/           # Optional

CLI Subcommands¶

Beyond project generation, the CLI provides:

Subcommand	Purpose
`bootstrap`	One-time AWS infrastructure setup (IAM role, ECR, S3, CI stack) with named profiles
`bootstrap status`	Show provisioned resources and detect drift
`registry log`	Record a deployment to the local registry
`registry list`	List past deployments
`mcp`	Query configured MCP servers directly

MCP Server Ecosystem¶

Five bundled MCP servers provide dynamic configuration data:

Server	Catalogs	Purpose
`base-image-picker`	Base images per backend	Curated, versioned container images
`instance-recommender`	Instance types with GPU specs	Instance selection with accelerator metadata
`region-picker`	Service availability per region	Region filtering by service support
`model-picker`	HuggingFace, JumpStart, S3, Registry	Model discovery and metadata resolution
`hyperpod-cluster-picker`	Live cluster discovery	Finds existing HyperPod EKS clusters

All servers support two modes: static (reads from JSON catalogs) and smart (queries live AWS APIs or Bedrock for reasoning).

Configuration Precedence¶

The ConfigManager merges values from 8 sources (highest precedence first):

CLI options (--deployment-config, --instance-type, etc.)
CLI arguments (positional)
Environment variables (ML_DEPLOYMENT_CONFIG, etc.)
CLI config file (--config path.json)
Custom config file (config/mcp.json)
package.json section
MCP server responses
Generator defaults

Supporting Infrastructure¶

Component	Purpose
Bootstrap profiles	Named AWS configurations at `~/.ml-container-creator/config.json`
Deployment registry	Per-profile asset manifest tracking deployed resources
CI Integration Harness	Automated end-to-end testing of all 15 deployment configs (see CI Integration)
Validation engine	Accelerator compatibility checks (CUDA, Neuron, ROCm, CPU)