Skip to content

Advisory Agent (hey)

The ml-container-creator hey command starts a conversational AI advisor that helps you understand your project configuration, troubleshoot deployment issues, plan workflows, and get optimization recommendations. It's powered by Amazon Bedrock (Claude Sonnet) and is advisory only — it reads your project state and provides expert guidance but does not execute commands, modify files, or provision infrastructure.

Prerequisites

  • Python 3.10+ with agent dependencies installed:
    pip install -r src/agent/requirements-agent.txt
    
  • AWS credentials configured with Bedrock access (the agent calls ConverseStream)
  • Bootstrap profile set up via ml-container-creator bootstrap

Tip

Run ml-container-creator hey --offline to verify your environment without incurring any Bedrock costs.

Quick Start

# Inside a project directory — project-aware conversation
cd my-vllm-project/
ml-container-creator hey

# Outside a project directory — getting-started guidance
cd ~/
ml-container-creator hey

# Static health check only (no Bedrock, no cost)
ml-container-creator hey --offline

Modes

Project Mode (inside a project directory)

When do/config exists in the current directory, the agent enters project mode:

  • Loads your full project context: do/config, do/ic/*.conf, do/training/config.yaml, Dockerfile, adapters, and bootstrap profile
  • Runs an environment health check at startup (prerequisites, AWS credentials, MCP server availability)
  • Answers questions specific to your configuration
  • Makes recommendations referencing your exact file paths and variable names

Getting-Started Mode (outside a project directory)

When no do/config is found, the agent enters getting-started mode:

  • Checks if you've bootstrapped (~/.ml-container-creator/config.json)
  • Validates prerequisites (Node.js, Python, AWS credentials, pip packages)
  • Walks you through first-time setup and project creation
  • Explains what the tool does and how the lifecycle works

What It Can Help With

  • Instance selection: "What instance should I use for Llama-3.1-8B with LoRA?" → queries instance catalog, calculates VRAM, recommends with math
  • Config explanation: "What does IC_ENV_VLLM_MAX_MODEL_LEN do?" → explains the variable, shows your current value, recommends what it should be
  • Troubleshooting: "I'm getting OOM on deploy" → identifies pattern (CUDA graph overhead, LoRA pre-allocation), suggests specific fix
  • Workflow planning: "Plan a deployment workflow for my model" → generates step-by-step plan, offers to save as TODO.md
  • Feature status: "Is SGLang LoRA supported?" → queries capability matrix, gives honest "no" with alternatives
  • Project summary: "What's my current config?" → reads and summarizes your entire project state
  • Optimization: "How can I improve throughput?" → recommends FP8 quantization, batch settings, context length tuning

Note

The agent calls MCP servers (instance-sizer, model-picker, base-image-picker, etc.) to get factual data before answering. It does not guess instance specs or model parameters.

Flags

Flag Description
--offline / -o Print environment health check and project summary, then exit. No Bedrock calls, no cost.
--project-dir <dir> Override project directory (default: current working directory).

Commands During Conversation

Command Effect
reload Re-read project files (use after editing config mid-session)
exit / quit / bye / q End session gracefully (prints cost summary)
Ctrl+C Interrupt current response or end session

Customizing Agent Knowledge

Create a .mlcc-agent-context.md file in your project root to inject team-specific knowledge:

<!-- .mlcc-agent-context.md -->
# Team Conventions

- We always use FP8 quantization for cost optimization
- Our max_model_len policy is 4096 (higher requires VP approval)
- Preferred instance family: g5 (approved in our AWS account)
- Adapters are named: tuned-<technique>-<dataset>-<date>
- All deployments go through the staging endpoint first

The agent reads this file at startup and incorporates it into all recommendations. Use it for:

  • Naming conventions and deployment patterns
  • Instance/region preferences and constraints
  • Known issues specific to your environment
  • Cost policies and approval requirements

Cost

Each session uses Amazon Bedrock Claude Sonnet. Token usage and estimated cost are displayed when you exit:

Session Summary
────────────────────────────────────────
  Turns: 8
  Input tokens:  ~12,400
  Output tokens: ~3,200
  Estimated cost: ~$0.0852

A typical 10-turn session costs ~\(0.05–\)0.10. Use --offline for zero-cost quick reference.

Warning

Cost tracking is approximate. Actual billing comes from your AWS account's Bedrock usage metrics.

Limitations

  • Advisory only — does not execute do/ scripts, deploy endpoints, or modify config files. Execution capabilities are planned for v1.2.
  • Session state is not persisted — each hey invocation starts fresh. Use TODO.md output to capture plans.
  • Knowledge is version-bound — the agent knows about features in the installed version. Custom forks or unreleased changes aren't reflected unless you add them via .mlcc-agent-context.md.
  • Requires internet — Bedrock access needed for interactive mode. Use --offline for air-gapped environments.