MCP Servers¶

ML Container Creator supports Model Context Protocol (MCP) as a configuration source. MCP servers provide configuration values -- like recommended instance types or AWS regions -- that the generator merges into its configuration chain during project generation.

How It Works¶

MCP is an open protocol that standardizes how applications communicate with external tool servers. In ML Container Creator, MCP serves as a configuration provider protocol: the generator spawns MCP servers as child processes, queries them for parameter values over stdio, and merges the results into the configuration. No LLM is in the loop -- the generator programmatically queries servers and merges results. The servers themselves are fully MCP-compliant, so any MCP client (Claude, Kiro, or your own) can also connect to them.

sequenceDiagram
    participant User
    participant Generator as ML Container Creator
    participant MCP as MCP Server (child process)

    User->>Generator: ml-container-creator
    Generator->>Generator: Load config files
    Generator->>MCP: Spawn process, handshake
    Generator->>MCP: Call get_ml_config tool
    MCP-->>Generator: Return values + choices
    Generator->>Generator: Merge MCP values
    Generator->>Generator: Load env vars, CLI args
    Generator->>User: Present prompts with MCP choices

MCP sits at priority 4 in the configuration precedence chain -- below CLI options, arguments, and environment variables, but above config files and defaults.

MCP is entirely optional. If a server is not configured, unreachable, times out (default 10s), or returns errors, the generator logs a warning and continues without MCP values. Prompts fall back to their default choices.

Eligible Parameters¶

Only parameters with unbounded value spaces are eligible for MCP:

Parameter	MCP Eligible	Reason
`instanceType`	yes	Open-ended set of SageMaker instance types
`awsRegion`	yes	AWS adds new regions over time
`awsRoleArn`	yes	Arbitrary IAM role ARNs
`framework`	no	Fixed set: sklearn, xgboost, tensorflow, transformers
`modelServer`	no	Fixed set: flask, fastapi, vllm, sglang, etc.
All others	no	Bounded value spaces

MCP servers can return values for any parameter, but the generator silently discards values for ineligible parameters.

Managing MCP Servers¶

Initialize All Bundled Servers¶

ml-container-creator mcp init

Creates config/mcp.json with every bundled server pre-configured. Existing servers are preserved.

Add a Server¶

ml-container-creator mcp add team-config -- node path/to/server.js

With environment variables and options:

ml-container-creator mcp add team-config -- npx -y @corp/mcp-config \
  -e TEAM_ID=ml-platform \
  --tool-name get_approved_config \
  --limit 5

The mcp add command registers a server in your config file. The server is spawned and queried later, when you run the generator.

Add a Bundled Server¶

The generator ships with first-party MCP servers in the servers/ directory:

ml-container-creator mcp add instance-recommender --bundled

Dependencies are installed automatically on first use.

List, Inspect, Remove¶

ml-container-creator mcp list              # List configured servers
ml-container-creator mcp list --bundled     # List available bundled servers
ml-container-creator mcp get team-config    # Inspect a server
ml-container-creator mcp remove team-config # Remove a server

Config File Format¶

MCP servers are configured under the mcpServers key in config/mcp.json:

{
  "framework": "sklearn",
  "modelServer": "flask",
  "mcpServers": {
    "team-config": {
      "command": "node",
      "args": ["servers/instance-recommender/index.js"],
      "env": { "TEAM_ID": "ml-platform" },
      "toolName": "get_ml_config",
      "limit": 5
    }
  }
}

Field	Type	Required	Default	Description
`command`	string	yes	--	Executable to spawn
`args`	string[]	--	`[]`	Command-line arguments
`env`	object	--	`{}`	Additional environment variables
`toolName`	string	--	`get_ml_config`	MCP tool to call
`limit`	integer	--	`10`	Max choices per parameter

When multiple servers are configured, they are queried in order. Later servers take precedence for conflicting values.

Bundled Servers¶

instance-sizer¶

The single authority for all instance-related recommendations. Estimates VRAM requirements from model metadata, performs search/tag-based filtering, and returns filtered, ranked SageMaker instance recommendations. Supports both VRAM-driven sizing (when a model name is provided) and tag-based search (when an instanceSearch query is provided).

ml-container-creator mcp add instance-sizer --bundled

The instance-sizer accepts optional context including CUDA version constraints (from the base image), serving profile ENV overrides (for accurate KV cache estimation), and deployment target. When the model is known, it computes VRAM requirements and filters instances to only those with sufficient GPU memory and compatible CUDA versions.

region-picker¶

Suggests AWS regions based on a search term. Set REGION_SEARCH to filter by region code or location name (e.g., "europe", "tokyo", "us-west"). Without a search term, returns popular SageMaker regions.

ml-container-creator mcp add region-picker --bundled -e REGION_SEARCH=europe

model-picker¶

Discovers and resolves model metadata from multiple sources (HuggingFace Hub, JumpStart, S3, SageMaker Model Registry). Returns model configuration including architecture, parameter count, and framework compatibility.

ml-container-creator mcp add model-picker --bundled

base-image-picker¶

Recommends base Docker images based on the selected deployment configuration, framework version, and accelerator requirements.

ml-container-creator mcp add base-image-picker --bundled

Smart Mode (Amazon Bedrock)¶

Both bundled servers support an optional smart mode that queries Amazon Bedrock for context-aware recommendations instead of returning static lists. Set BEDROCK_SMART=true in the server's environment to enable it. If the Bedrock call fails, the server falls back to static recommendations.

ml-container-creator mcp add instance-recommender --bundled \
  -e BEDROCK_SMART=true

Configuration¶

Environment Variable	Default	Description
`BEDROCK_SMART`	`false`	Enable Bedrock-powered recommendations
`BEDROCK_MODEL`	`global.anthropic.claude-sonnet-4-20250514-v1:0`	Bedrock model ID
`BEDROCK_REGION`	`us-east-1`	AWS region for Bedrock API calls

The default model uses the global cross-region inference profile, which routes requests to the nearest available region. You can override this with any Bedrock model ID that supports the Messages API.

Prerequisites¶

AWS credentials configured (via environment, profile, or IAM role)
Access to the specified Bedrock model enabled in your account

IAM Permissions¶

The calling identity needs bedrock:InvokeModel on the inference profile:

{
    "Effect": "Allow",
    "Action": "bedrock:InvokeModel",
    "Resource": "arn:aws:bedrock:*:*:inference-profile/global.anthropic.claude-sonnet-4-20250514-v1:0"
}

Writing a Custom MCP Server¶

Any process that speaks the MCP protocol over stdio can serve as a configuration provider. Your server needs to:

Handle the MCP initialize handshake
Register a tool (default name: get_ml_config)
Accept { parameters, limit, context } as tool input
Return { values, choices } as a JSON text response

Tool Input¶

{
  "parameters": ["instanceType", "awsRoleArn", "awsRegion"],
  "limit": 10,
  "context": {
    "framework": "transformers",
    "modelServer": "vllm"
  }
}

parameters -- which unbounded parameter names the generator is requesting
limit -- maximum number of choices to return per parameter
context -- current configuration state for informed recommendations

Tool Response¶

{
  "values": {
    "instanceType": "ml.g5.xlarge"
  },
  "choices": {
    "instanceType": [
      "ml.g5.xlarge",
      "ml.g5.2xlarge",
      "ml.g4dn.xlarge"
    ]
  }
}

values -- recommended default value per parameter (merged into config)
choices -- list of options per parameter (shown during prompting)

Both fields are optional. A server may return only values, only choices, or both.

Example Server¶

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
import { z } from 'zod'

const server = new McpServer({ name: 'my-config-server', version: '1.0.0' })

server.tool(
    'get_ml_config',
    'Returns ML configuration values',
    {
        parameters: z.array(z.string()),
        limit: z.number().int().positive().default(10),
        context: z.record(z.string(), z.any()).optional()
    },
    async ({ parameters, limit, context }) => {
        const values = {}
        const choices = {}

        // Your logic here — query a database, call an API, read a file, etc.

        return {
            content: [{
                type: 'text',
                text: JSON.stringify({ values, choices })
            }]
        }
    }
)

const transport = new StdioServerTransport()
await server.connect(transport)

Register it:

ml-container-creator mcp add my-server -- node path/to/my-server.js

See servers/README.md for the full directory structure and license requirements for bundled servers.