ML Container Creator - Architecture Guide¶
Project Overview¶
This is a Yeoman generator that creates Docker containers for deploying ML models to AWS SageMaker using the Bring Your Own Container (BYOC) paradigm.
Quick Architecture Overview¶
For newcomers, here's how the generator works:
User runs: yo ml-container-creator
↓
┌─────────────────┐
│ index.js │ ← Main generator (orchestration)
│ (~50 lines) │
└─────────────────┘
↓
┌─────────────────┐
│ PromptRunner │ ← Collects user input
│ (prompts.js) │ • Project name
└─────────────────┘ • Framework choice
↓ • Optional modules
┌─────────────────┐
│ TemplateManager │ ← Validates & determines templates
│ │ • Checks supported options
└─────────────────┘ • Builds ignore patterns
↓
┌─────────────────┐
│ Template Copy │ ← Copies & processes templates
│ (EJS processing)│ • Replaces variables
└─────────────────┘ • Excludes unwanted files
↓
Generated Project Ready! 🎉
Detailed Architecture¶
Generator Structure (Modular Design)¶
The generator follows a clean, modular architecture:
- Main generator:
generators/app/index.js- Orchestrates the generation process (~50 lines) - Prompt definitions:
generators/app/lib/prompts.js- All user prompts organized by phase - Prompt orchestration:
generators/app/lib/prompt-runner.js- Manages user interaction flow - Template logic:
generators/app/lib/template-manager.js- Handles conditional template copying - Templates:
generators/app/templates/- EJS templates that get copied and processed
Key Components¶
1. Main Generator (index.js)¶
- Purpose: Orchestrates the generation process
- Size: ~50 lines (was 300+ before refactoring)
- Responsibilities:
- Delegates to specialized modules
- Sets destination directory
- Handles errors
2. Prompt Runner (lib/prompt-runner.js)¶
- Purpose: Manages user interaction
- Phases:
- 📋 Project Configuration
- 🔧 Core Configuration
- 📦 Module Selection
- 💪 Infrastructure & Performance
- Output: Combined answers object
3. Template Manager (lib/template-manager.js)¶
- Purpose: Handles template logic
- Functions:
- Validates user configuration
- Determines which templates to include/exclude
- Centralizes conditional logic
4. Prompts (lib/prompts.js)¶
- Purpose: Defines all user prompts
- Organization: Grouped by phase
- Benefits: Easy to add new prompts
Key Concepts¶
- Yeoman Generator Pattern: Extends
yeoman-generatorbase class - Phases: Generator runs in phases (prompting → writing → install)
- Template Processing: Uses EJS syntax (
<%= variable %>) in template files - Conditional Generation: Files can be excluded via
ignorePatternsarray - Modular Design: Separation of concerns for maintainability
Supported Configurations¶
Frameworks¶
sklearn- scikit-learn modelsxgboost- XGBoost modelstensorflow- TensorFlow/Keras modelstransformers- Hugging Face transformer models (LLMs)
Model Servers¶
flask- Flask-based serving (traditional ML)fastapi- FastAPI-based serving (traditional ML)vllm- vLLM serving (transformers only)sglang- SGLang serving (transformers only)
Model Formats¶
- sklearn:
pkl,joblib - xgboost:
json,model,ubj - tensorflow:
keras,h5,SavedModel - transformers: N/A (loaded from Hugging Face Hub)
Code Conventions¶
JavaScript Style¶
- Use ES6+ features (const, arrow functions, async/await)
- Prefer
constoverlet, avoidvar - Use template literals for string interpolation
- Follow existing ESLint configuration
Generator Methods¶
prompting()- Collect user input via interactive promptswriting()- Copy and process template files_validateAnswers()- Private method for validation (prefix with_)
Template Variables¶
All answers are stored in this.answers and available in templates:
{
projectName,
destinationDir,
framework,
modelFormat,
modelServer,
includeSampleModel,
includeTesting,
testTypes,
deployTarget,
instanceType,
awsRegion,
buildTimestamp
}
Configuration Decision Tree¶
Framework?
├── sklearn/xgboost/tensorflow (Traditional ML)
│ ├── Model Format? (pkl, json, keras, etc.)
│ ├── Server? (Flask or FastAPI)
│ ├── Include sample model? (Yes/No)
│ ├── Include tests? (Yes/No)
│ └── Instance type? (CPU or GPU)
│
└── transformers (LLMs)
├── Server? (vLLM or SGLang)
├── Include tests? (Yes - endpoint only)
└── Instance type? (GPU only)
Template Structure & Logic¶
Generated Project Structure¶
project-name/
├── Dockerfile ← Always included
├── requirements.txt ← Excluded for transformers
├── nginx.conf ← Excluded for transformers
├── code/
│ ├── model_handler.py ← Excluded for transformers
│ ├── serve.py ← Excluded for transformers
│ ├── serve ← Excluded for traditional ML
│ └── flask/ ← Excluded if not Flask
├── deploy/
│ ├── build_and_push.sh ← Always included
│ ├── deploy.sh ← Always included
│ └── upload_to_s3.sh ← Excluded for traditional ML
├── sample_model/ ← Optional module
└── test/ ← Optional module
File Generation Logic¶
The generator uses exclusion patterns to determine which templates to copy:
// Example: Transformers exclude traditional ML files
if (framework === 'transformers') {
exclude: [
'model_handler.py', // Custom loading
'serve.py', // Flask/FastAPI
'nginx.conf', // Reverse proxy
'requirements.txt' // Traditional deps
]
}
Template Exclusion Logic¶
Files are conditionally excluded based on configuration:
- Transformers: Excludes traditional ML serving code (model_handler.py, serve.py, nginx.conf)
- Traditional ML: Excludes transformer serving code (code/serve, upload_to_s3.sh)
- Non-Flask: Excludes Flask-specific code
- No sample model: Excludes sample_model/ directory
- No testing: Excludes test/ directory
Benefits of This Architecture¶
✅ Maintainable¶
- Small, focused modules
- Clear separation of concerns
- Easy to understand and modify
✅ Testable¶
- Each module can be tested independently
- Clear inputs and outputs
- Mocking is straightforward
✅ Extensible¶
- Adding new prompts is simple
- New template logic is centralized
- Framework additions follow patterns
✅ Readable¶
- Main generator is ~50 lines
- Logic is organized by purpose
- Comments explain the "why"
Development Workflow¶
Making Changes¶
- Edit generator logic in appropriate module:
- Prompts:
generators/app/lib/prompts.js - Template logic:
generators/app/lib/template-manager.js - Orchestration:
generators/app/index.js - Edit templates in
generators/app/templates/ - Run
npm linkto test locally - Test with
yo ml-container-creator - Run
npm testbefore committing
Testing¶
- Unit tests in
test/directory - Focus on
TemplateManagerand core logic - Run security audit before tests:
npm run pretest - Use
npm run test:watchfor development
Adding New Features¶
Adding a New Prompt¶
- Add prompt definition to appropriate phase in
prompts.js - Update template logic if it affects file generation
- Add test cases for the new option
Adding a New Template¶
- Create template file with EJS variables
- Add exclusion logic if conditional
- Test with different configurations
Adding a New Framework¶
- Add to prompt choices in
prompts.js - Add template exclusion logic to
template-manager.js - Create framework-specific templates
- Add tests for the new configuration
AWS/SageMaker Context¶
SageMaker BYOC Requirements¶
- Container must expose port 8080
- Must implement
/ping(health check) and/invocations(inference) endpoints - Model artifacts typically stored in
/opt/ml/model/ - Environment variables:
SM_MODEL_DIR,SM_NUM_GPUS, etc.
Deployment Flow¶
- Build Docker image locally
- Push to Amazon ECR
- Create SageMaker model from ECR image
- Deploy to SageMaker endpoint
- Test endpoint with sample data
Common Patterns¶
Adding a New Framework¶
- Add to
SUPPORTED_OPTIONS.frameworks - Add model format choices in prompting
- Create template variations if needed
- Update validation logic
- Test all combinations
Adding a New Model Server¶
- Add to
SUPPORTED_OPTIONS.modelServer - Create server-specific templates in
code/ - Update ignore patterns
- Add appropriate dependencies to requirements.txt template
- Update Dockerfile if needed
Dependencies¶
Runtime (Generated Projects)¶
- Python 3.8+
- Docker 20+
- AWS CLI 2+
- Framework-specific: scikit-learn, xgboost, tensorflow, vllm, sglang
Development (Generator)¶
- Node.js 24+
- Yeoman
- ESLint
- Mocha
Security Considerations¶
- Run
npm auditbefore tests (automated in pretest) - Use
npm-force-resolutionsfor dependency overrides - Keep dependencies updated via
overridesin package.json - Never commit AWS credentials
Troubleshooting¶
Generator Issues¶
- Run
npm linkafter changes - Clear Yeoman cache:
rm -rf ~/.config/configstore/insight-yo.json - Check Node version:
node --version(must be 24+)
Generated Project Issues¶
- Test locally before deploying:
docker buildanddocker run - Check SageMaker logs in CloudWatch
- Verify IAM role has necessary permissions
- Ensure ECR repository exists in target region