Multi-Provider Architecture Design¶
Overview¶
The Open Host Factory Plugin implements a sophisticated multi-provider architecture that enables dynamic provisioning of compute resources across multiple cloud providers and provider instances. This document describes the design, implementation, and usage patterns of the multi-provider system.
Architecture Components¶
CQRS Implementation Status¶
The system implements CQRS (Command Query Responsibility Segregation) architecture:
Completed CQRS Components:
- CommandBus
and QueryBus
infrastructure in src/infrastructure/di/buses.py
- Query DTOs: ListTemplatesQuery
, GetTemplateQuery
, ValidateTemplateQuery
- Command DTOs: CreateTemplateCommand
, UpdateTemplateCommand
, DeleteTemplateCommand
, ValidateTemplateCommand
- Template list endpoint using QueryBus
Implementation Status: - Template API endpoints (GET, POST, PUT, DELETE) - using CQRS handlers - Machine management endpoints - using CQRS pattern - Request processing endpoints - using CQRS pattern - Provider management endpoints - using CQRS pattern
Architecture Features: - All API endpoints using CQRS buses for command/query separation - Consistent async/await patterns across all handlers - Appropriate separation of read and write operations - Optimized query handling with caching support
1. Domain Model Extensions¶
Template Aggregate¶
The Template
aggregate has been extended with multi-provider fields:
class Template:
template_id: str
provider_type: Optional[str] # NEW: Provider type (aws, azure, gcp)
provider_name: Optional[str] # NEW: Provider instance name (aws-us-east-1)
provider_api: Optional[str] # NEW: Specific API to use (EC2Fleet, SpotFleet)
# ... existing fields
Request Aggregate¶
The Request
aggregate now tracks provider selection:
class Request:
provider_type: str
provider_instance: Optional[str] # NEW: Selected provider instance
# ... existing fields
2. Provider Selection Service¶
The ProviderSelectionService
implements intelligent provider selection using multiple strategies:
Selection Strategies¶
- Explicit Selection: Template specifies exact provider instance
- Load Balanced Selection: Distribute across provider instances by type
- Capability-Based Selection: Select based on API requirements
- Default Selection: Use configuration defaults
Selection Algorithm¶
def select_provider_for_template(template: Template) -> ProviderSelectionResult:
if template.provider_name:
return explicit_selection(template.provider_name)
elif template.provider_type:
return load_balanced_selection(template.provider_type)
elif template.provider_api:
return capability_based_selection(template.provider_api)
else:
return default_selection()
3. Provider Capability Service¶
The ProviderCapabilityService
validates template requirements against provider capabilities:
Validation Levels¶
- STRICT: All warnings become errors
- LENIENT: Warnings allowed, only critical errors fail
- BASIC: Only critical validation, minimal checks
Capability Validation¶
def validate_template_requirements(
template: Template,
provider_instance: str,
level: ValidationLevel
) -> ValidationResult:
# Validate API support
# Check instance limits
# Verify pricing model support
# Validate fleet type compatibility
4. Template Repository Architecture¶
The template system implements a repository pattern that provides compliance with Clean Architecture principles:
Template Repository Implementation¶
The TemplateRepositoryImpl
class provides a complete implementation of both AggregateRepository
and TemplateRepository
interfaces:
class TemplateRepositoryImpl(TemplateRepository):
"""Template repository implementation for configuration-based template management."""
# Abstract methods from AggregateRepository
def save(self, aggregate: Template) -> None:
"""Save a template aggregate."""
def find_by_id(self, aggregate_id: str) -> Optional[Template]:
"""Find template by aggregate ID."""
def delete(self, aggregate_id: str) -> None:
"""Delete template by aggregate ID."""
# Abstract methods from TemplateRepository
def find_by_template_id(self, template_id: str) -> Optional[Template]:
"""Find template by template ID (delegates to find_by_id)."""
def find_by_provider_api(self, provider_api: str) -> List[Template]:
"""Find templates by provider API type."""
def find_active_templates(self) -> List[Template]:
"""Find all active templates."""
def search_templates(self, criteria: Dict[str, Any]) -> List[Template]:
"""Search templates by criteria."""
Key Architecture Improvements¶
- Full Interface Compliance: Implements all required abstract methods from both base interfaces
- Method Delegation: Avoids code duplication by delegating
find_by_template_id
tofind_by_id
- Clean Dependency Injection: Uses factory pattern registration instead of decorator-based DI
- Comprehensive Functionality: Provides both required methods and convenience methods
Provider-Specific Template Loading¶
The ProviderTemplateStrategy
implements hierarchical template loading:
File Priority Order (Highest to Lowest)¶
- Provider instance files:
{provider-instance}_templates.json
- Provider type files:
{provider-type}prov_templates.json
- Main templates file:
templates.json
- Legacy templates file:
awsprov_templates.json
Template Override Behavior¶
Templates with the same template_id
in higher priority files override those in lower priority files.
Configuration Schema¶
Provider Configuration¶
providers:
selection_policy: "WEIGHTED_ROUND_ROBIN"
default_provider_type: "aws"
default_provider_instance: "aws-us-east-1"
providers:
- name: "aws-us-east-1"
type: "aws"
enabled: true
priority: 1
weight: 10
capabilities: ["EC2Fleet", "SpotFleet", "RunInstances", "ASG"]
- name: "aws-us-west-2"
type: "aws"
enabled: true
priority: 2
weight: 5
capabilities: ["EC2Fleet", "RunInstances"]
Template Examples¶
Explicit Provider Selection¶
{
"template_id": "explicit-aws-east",
"provider_name": "aws-us-east-1",
"provider_api": "EC2Fleet",
"image_id": "ami-12345",
"subnet_ids": ["subnet-123"],
"max_instances": 5
}
Provider Type Selection (Load Balanced)¶
{
"template_id": "load-balanced-aws",
"provider_type": "aws",
"provider_api": "SpotFleet",
"image_id": "ami-67890",
"subnet_ids": ["subnet-456"],
"max_instances": 10
}
API-Based Selection¶
{
"template_id": "api-based-selection",
"provider_api": "RunInstances",
"image_id": "ami-abcdef",
"subnet_ids": ["subnet-789"],
"max_instances": 3
}
Provider Selection Algorithms¶
Weighted Round Robin¶
Distributes requests across provider instances based on configured weights:
def weighted_round_robin_selection(providers: List[ProviderInstance]) -> str:
total_weight = sum(p.weight for p in providers)
random_value = random.randint(1, total_weight)
current_weight = 0
for provider in providers:
current_weight += provider.weight
if random_value <= current_weight:
return provider.name
Priority-Based Selection¶
Selects highest priority available provider:
def priority_based_selection(providers: List[ProviderInstance]) -> str:
enabled_providers = [p for p in providers if p.enabled]
return min(enabled_providers, key=lambda p: p.priority).name
Template File Organization¶
Directory Structure¶
config/
- templates.json # Main templates
- awsprov_templates.json # AWS provider type templates
- azureprov_templates.json # Azure provider type templates
- aws-us-east-1_templates.json # AWS US East instance templates
- aws-us-west-2_templates.json # AWS US West instance templates
- azure-east-us_templates.json # Azure East US instance templates
Template Inheritance¶
Templates inherit and override properties based on file priority:
// templates.json (base)
{
"template_id": "web-server",
"image_id": "ami-base",
"instance_type": "t2.micro",
"max_instances": 2
}
// awsprov_templates.json (provider override)
{
"template_id": "web-server",
"provider_type": "aws",
"provider_api": "EC2Fleet",
"instance_type": "t3.small",
"max_instances": 5
}
// aws-us-east-1_templates.json (instance override)
{
"template_id": "web-server",
"provider_name": "aws-us-east-1",
"image_id": "ami-east-optimized",
"max_instances": 10
}
Final resolved template:
{
"template_id": "web-server",
"provider_name": "aws-us-east-1",
"provider_type": "aws",
"provider_api": "EC2Fleet",
"image_id": "ami-east-optimized",
"instance_type": "t3.small",
"max_instances": 10
}
API Integration¶
REST API Endpoints¶
Provider Information¶
GET /api/v1/providers
GET /api/v1/providers/{provider-instance}/capabilities
GET /api/v1/providers/{provider-instance}/templates
Template Management¶
GET /api/v1/templates?provider_type=aws
GET /api/v1/templates?provider_name=aws-us-east-1
POST /api/v1/templates/validate
Request Processing¶
POST /api/v1/requests
{
"templateId": "web-server",
"maxNumber": 5,
"providerPreference": {
"type": "aws",
"instance": "aws-us-east-1"
}
}
CLI Commands¶
Provider Management¶
# List available providers
ohfp providers list
# Show provider capabilities
ohfp providers show aws-us-east-1
# Validate provider configuration
ohfp providers validate
Template Operations¶
# List templates by provider
ohfp templates list --provider-type aws
ohfp templates list --provider-name aws-us-east-1
# Show template source information
ohfp templates show web-server --source-info
# Validate template against provider
ohfp templates validate web-server --provider aws-us-east-1
Error Handling and Validation¶
Provider Selection Errors¶
- No enabled providers: When no providers are available
- Provider not found: When explicit provider doesn't exist
- Provider disabled: When selected provider is disabled
- No compatible providers: When no providers support required API
Template Validation Errors¶
- API not supported: Provider doesn't support required API
- Instance limit exceeded: Request exceeds provider limits
- Pricing model mismatch: Provider doesn't support pricing model
- Fleet type incompatible: Provider doesn't support fleet type
Error Response Format¶
{
"error": {
"code": "PROVIDER_NOT_FOUND",
"message": "Provider instance 'aws-invalid' not found in configuration",
"details": {
"requested_provider": "aws-invalid",
"available_providers": ["aws-us-east-1", "aws-us-west-2"]
}
}
}
Performance Considerations¶
Template Caching¶
- Templates are cached in memory with file modification time tracking
- Cache is automatically refreshed when template files change
- Manual cache refresh available via API and CLI
Provider Selection Optimization¶
- Provider configurations are cached at startup
- Selection algorithms use pre-computed weights and priorities
- Capability validation results are cached per provider-API combination
File I/O Optimization¶
- Template files are loaded once and cached
- Only modified files are reloaded
- Batch operations minimize file system calls
Monitoring and Observability¶
Metrics¶
- Provider selection distribution
- Template validation success/failure rates
- File loading performance
- Cache hit/miss ratios
Logging¶
- Provider selection decisions with reasoning
- Template override chains
- Validation failures with details
- Performance timing information
Health Checks¶
- Provider availability status
- Template file accessibility
- Configuration validation status
- Cache consistency checks
Migration Guide¶
From Single Provider¶
- Update configuration to include provider instances
- Migrate templates to provider-specific files (optional)
- Update API calls to include provider preferences (optional)
- Test provider selection behavior
Template Migration¶
# Migrate existing templates to provider-specific files
ohfp templates migrate --from templates.json --to-provider aws-us-east-1
# Validate migrated templates
ohfp templates validate --all --provider aws-us-east-1
Best Practices¶
Configuration¶
- Use meaningful provider instance names
- Set appropriate weights for load balancing
- Enable only necessary providers
- Regular validation of provider configurations
Template Organization¶
- Use provider-specific files for customizations
- Keep common templates in main file
- Document template inheritance chains
- Regular cleanup of unused templates
Monitoring¶
- Monitor provider selection distribution
- Track validation failure patterns
- Alert on provider availability issues
- Regular performance reviews
Future Enhancements¶
Planned Features¶
- Dynamic provider discovery
- Cross-provider failover
- Improved scheduling algorithms
- Provider cost optimization
- Multi-region template synchronization
Extension Points¶
- Custom selection strategies
- Provider-specific validation rules
- Template transformation pipelines
- External provider registries