Advisory Agent (hey)¶
The ml-container-creator hey command starts a conversational AI advisor that helps you understand your project configuration, troubleshoot deployment issues, plan workflows, and get optimization recommendations. It's powered by Amazon Bedrock (Claude Sonnet) and is advisory only — it reads your project state and provides expert guidance but does not execute commands, modify files, or provision infrastructure.
Prerequisites¶
- Python 3.10+ with agent dependencies installed:
- AWS credentials configured with Bedrock access (the agent calls
ConverseStream) - Bootstrap profile set up via
ml-container-creator bootstrap
Tip
Run ml-container-creator hey --offline to verify your environment without incurring any Bedrock costs.
Quick Start¶
# Inside a project directory — project-aware conversation
cd my-vllm-project/
ml-container-creator hey
# Outside a project directory — getting-started guidance
cd ~/
ml-container-creator hey
# Static health check only (no Bedrock, no cost)
ml-container-creator hey --offline
Modes¶
Project Mode (inside a project directory)¶
When do/config exists in the current directory, the agent enters project mode:
- Loads your full project context:
do/config,do/ic/*.conf,do/training/config.yaml, Dockerfile, adapters, and bootstrap profile - Runs an environment health check at startup (prerequisites, AWS credentials, MCP server availability)
- Answers questions specific to your configuration
- Makes recommendations referencing your exact file paths and variable names
Getting-Started Mode (outside a project directory)¶
When no do/config is found, the agent enters getting-started mode:
- Checks if you've bootstrapped (
~/.ml-container-creator/config.json) - Validates prerequisites (Node.js, Python, AWS credentials, pip packages)
- Walks you through first-time setup and project creation
- Explains what the tool does and how the lifecycle works
What It Can Help With¶
- Instance selection: "What instance should I use for Llama-3.1-8B with LoRA?" → queries instance catalog, calculates VRAM, recommends with math
- Config explanation: "What does IC_ENV_VLLM_MAX_MODEL_LEN do?" → explains the variable, shows your current value, recommends what it should be
- Troubleshooting: "I'm getting OOM on deploy" → identifies pattern (CUDA graph overhead, LoRA pre-allocation), suggests specific fix
- Workflow planning: "Plan a deployment workflow for my model" → generates step-by-step plan, offers to save as
TODO.md - Feature status: "Is SGLang LoRA supported?" → queries capability matrix, gives honest "no" with alternatives
- Project summary: "What's my current config?" → reads and summarizes your entire project state
- Optimization: "How can I improve throughput?" → recommends FP8 quantization, batch settings, context length tuning
Note
The agent calls MCP servers (instance-sizer, model-picker, base-image-picker, etc.) to get factual data before answering. It does not guess instance specs or model parameters.
Flags¶
| Flag | Description |
|---|---|
--offline / -o |
Print environment health check and project summary, then exit. No Bedrock calls, no cost. |
--project-dir <dir> |
Override project directory (default: current working directory). |
Commands During Conversation¶
| Command | Effect |
|---|---|
reload |
Re-read project files (use after editing config mid-session) |
exit / quit / bye / q |
End session gracefully (prints cost summary) |
| Ctrl+C | Interrupt current response or end session |
Customizing Agent Knowledge¶
Create a .mlcc-agent-context.md file in your project root to inject team-specific knowledge:
<!-- .mlcc-agent-context.md -->
# Team Conventions
- We always use FP8 quantization for cost optimization
- Our max_model_len policy is 4096 (higher requires VP approval)
- Preferred instance family: g5 (approved in our AWS account)
- Adapters are named: tuned-<technique>-<dataset>-<date>
- All deployments go through the staging endpoint first
The agent reads this file at startup and incorporates it into all recommendations. Use it for:
- Naming conventions and deployment patterns
- Instance/region preferences and constraints
- Known issues specific to your environment
- Cost policies and approval requirements
Cost¶
Each session uses Amazon Bedrock Claude Sonnet. Token usage and estimated cost are displayed when you exit:
Session Summary
────────────────────────────────────────
Turns: 8
Input tokens: ~12,400
Output tokens: ~3,200
Estimated cost: ~$0.0852
A typical 10-turn session costs ~\(0.05–\)0.10. Use --offline for zero-cost quick reference.
Warning
Cost tracking is approximate. Actual billing comes from your AWS account's Bedrock usage metrics.
Limitations¶
- Advisory only — does not execute
do/scripts, deploy endpoints, or modify config files. Execution capabilities are planned for v1.2. - Session state is not persisted — each
heyinvocation starts fresh. UseTODO.mdoutput to capture plans. - Knowledge is version-bound — the agent knows about features in the installed version. Custom forks or unreleased changes aren't reflected unless you add them via
.mlcc-agent-context.md. - Requires internet — Bedrock access needed for interactive mode. Use
--offlinefor air-gapped environments.