Multi-Provider Architecture Design¶

Overview¶

The Open Host Factory Plugin implements a sophisticated multi-provider architecture that enables dynamic provisioning of compute resources across multiple cloud providers and provider instances. This document describes the design, implementation, and usage patterns of the multi-provider system.

Architecture Components¶

CQRS Implementation Status¶

The system implements CQRS (Command Query Responsibility Segregation) architecture:

Completed CQRS Components: - CommandBus and QueryBus infrastructure in src/infrastructure/di/buses.py - Query DTOs: ListTemplatesQuery, GetTemplateQuery, ValidateTemplateQuery - Command DTOs: CreateTemplateCommand, UpdateTemplateCommand, DeleteTemplateCommand, ValidateTemplateCommand - Template list endpoint using QueryBus

Implementation Status: - Template API endpoints (GET, POST, PUT, DELETE) - using CQRS handlers - Machine management endpoints - using CQRS pattern - Request processing endpoints - using CQRS pattern - Provider management endpoints - using CQRS pattern

Architecture Features: - All API endpoints using CQRS buses for command/query separation - Consistent async/await patterns across all handlers - Appropriate separation of read and write operations - Optimized query handling with caching support

1. Domain Model Extensions¶

Template Aggregate¶

The Template aggregate has been extended with multi-provider fields:

class Template:
    template_id: str
    provider_type: Optional[str]      # NEW: Provider type (aws, azure, gcp)
    provider_name: Optional[str]      # NEW: Provider instance name (aws-us-east-1)
    provider_api: Optional[str]       # NEW: Specific API to use (EC2Fleet, SpotFleet)
    # ... existing fields

Request Aggregate¶

The Request aggregate now tracks provider selection:

class Request:
    provider_type: str
    provider_instance: Optional[str]  # NEW: Selected provider instance
    # ... existing fields

2. Provider Selection Service¶

The ProviderSelectionService implements intelligent provider selection using multiple strategies:

Selection Strategies¶

Explicit Selection: Template specifies exact provider instance
Load Balanced Selection: Distribute across provider instances by type
Capability-Based Selection: Select based on API requirements
Default Selection: Use configuration defaults

Selection Algorithm¶

def select_provider_for_template(template: Template) -> ProviderSelectionResult:
    if template.provider_name:
        return explicit_selection(template.provider_name)
    elif template.provider_type:
        return load_balanced_selection(template.provider_type)
    elif template.provider_api:
        return capability_based_selection(template.provider_api)
    else:
        return default_selection()

3. Provider Capability Service¶

The ProviderCapabilityService validates template requirements against provider capabilities:

Validation Levels¶

STRICT: All warnings become errors
LENIENT: Warnings allowed, only critical errors fail
BASIC: Only critical validation, minimal checks

Capability Validation¶

def validate_template_requirements(
    template: Template, 
    provider_instance: str, 
    level: ValidationLevel
) -> ValidationResult:
    # Validate API support
    # Check instance limits
    # Verify pricing model support
    # Validate fleet type compatibility

4. Template Repository Architecture¶

The template system implements a repository pattern that provides compliance with Clean Architecture principles:

Template Repository Implementation¶

The TemplateRepositoryImpl class provides a complete implementation of both AggregateRepository and TemplateRepository interfaces:

class TemplateRepositoryImpl(TemplateRepository):
    """Template repository implementation for configuration-based template management."""

    # Abstract methods from AggregateRepository
    def save(self, aggregate: Template) -> None:
        """Save a template aggregate."""

    def find_by_id(self, aggregate_id: str) -> Optional[Template]:
        """Find template by aggregate ID."""

    def delete(self, aggregate_id: str) -> None:
        """Delete template by aggregate ID."""

    # Abstract methods from TemplateRepository
    def find_by_template_id(self, template_id: str) -> Optional[Template]:
        """Find template by template ID (delegates to find_by_id)."""

    def find_by_provider_api(self, provider_api: str) -> List[Template]:
        """Find templates by provider API type."""

    def find_active_templates(self) -> List[Template]:
        """Find all active templates."""

    def search_templates(self, criteria: Dict[str, Any]) -> List[Template]:
        """Search templates by criteria."""

Key Architecture Improvements¶

Full Interface Compliance: Implements all required abstract methods from both base interfaces
Method Delegation: Avoids code duplication by delegating find_by_template_id to find_by_id
Clean Dependency Injection: Uses factory pattern registration instead of decorator-based DI
Comprehensive Functionality: Provides both required methods and convenience methods

Provider-Specific Template Loading¶

The ProviderTemplateStrategy implements hierarchical template loading:

File Priority Order (Highest to Lowest)¶

Provider instance files: {provider-instance}_templates.json
Provider type files: {provider-type}prov_templates.json
Main templates file: templates.json
Legacy templates file: awsprov_templates.json

Template Override Behavior¶

Templates with the same template_id in higher priority files override those in lower priority files.

Configuration Schema¶

Provider Configuration¶

providers:
  selection_policy: "WEIGHTED_ROUND_ROBIN"
  default_provider_type: "aws"
  default_provider_instance: "aws-us-east-1"
  providers:
    - name: "aws-us-east-1"
      type: "aws"
      enabled: true
      priority: 1
      weight: 10
      capabilities: ["EC2Fleet", "SpotFleet", "RunInstances", "ASG"]
    - name: "aws-us-west-2"
      type: "aws"
      enabled: true
      priority: 2
      weight: 5
      capabilities: ["EC2Fleet", "RunInstances"]

Template Examples¶

Explicit Provider Selection¶

{
  "template_id": "explicit-aws-east",
  "provider_name": "aws-us-east-1",
  "provider_api": "EC2Fleet",
  "image_id": "ami-12345",
  "subnet_ids": ["subnet-123"],
  "max_instances": 5
}

Provider Type Selection (Load Balanced)¶

{
  "template_id": "load-balanced-aws",
  "provider_type": "aws",
  "provider_api": "SpotFleet",
  "image_id": "ami-67890",
  "subnet_ids": ["subnet-456"],
  "max_instances": 10
}

API-Based Selection¶

{
  "template_id": "api-based-selection",
  "provider_api": "RunInstances",
  "image_id": "ami-abcdef",
  "subnet_ids": ["subnet-789"],
  "max_instances": 3
}

Provider Selection Algorithms¶

Weighted Round Robin¶

Distributes requests across provider instances based on configured weights:

def weighted_round_robin_selection(providers: List[ProviderInstance]) -> str:
    total_weight = sum(p.weight for p in providers)
    random_value = random.randint(1, total_weight)

    current_weight = 0
    for provider in providers:
        current_weight += provider.weight
        if random_value <= current_weight:
            return provider.name

Priority-Based Selection¶

Selects highest priority available provider:

def priority_based_selection(providers: List[ProviderInstance]) -> str:
    enabled_providers = [p for p in providers if p.enabled]
    return min(enabled_providers, key=lambda p: p.priority).name

Template File Organization¶

Directory Structure¶

config/
- templates.json                    # Main templates
- awsprov_templates.json           # AWS provider type templates
- azureprov_templates.json         # Azure provider type templates
- aws-us-east-1_templates.json    # AWS US East instance templates
- aws-us-west-2_templates.json    # AWS US West instance templates
- azure-east-us_templates.json    # Azure East US instance templates

Template Inheritance¶

Templates inherit and override properties based on file priority:

// templates.json (base)
{
  "template_id": "web-server",
  "image_id": "ami-base",
  "instance_type": "t2.micro",
  "max_instances": 2
}

// awsprov_templates.json (provider override)
{
  "template_id": "web-server",
  "provider_type": "aws",
  "provider_api": "EC2Fleet",
  "instance_type": "t3.small",
  "max_instances": 5
}

// aws-us-east-1_templates.json (instance override)
{
  "template_id": "web-server",
  "provider_name": "aws-us-east-1",
  "image_id": "ami-east-optimized",
  "max_instances": 10
}

Final resolved template:

{
  "template_id": "web-server",
  "provider_name": "aws-us-east-1",
  "provider_type": "aws",
  "provider_api": "EC2Fleet",
  "image_id": "ami-east-optimized",
  "instance_type": "t3.small",
  "max_instances": 10
}

API Integration¶

REST API Endpoints¶

Provider Information¶

GET /api/v1/providers
GET /api/v1/providers/{provider-instance}/capabilities
GET /api/v1/providers/{provider-instance}/templates

Template Management¶

GET /api/v1/templates?provider_type=aws
GET /api/v1/templates?provider_name=aws-us-east-1
POST /api/v1/templates/validate

Request Processing¶

POST /api/v1/requests
{
  "templateId": "web-server",
  "maxNumber": 5,
  "providerPreference": {
    "type": "aws",
    "instance": "aws-us-east-1"
  }
}

CLI Commands¶

Provider Management¶

# List available providers
ohfp providers list

# Show provider capabilities
ohfp providers show aws-us-east-1

# Validate provider configuration
ohfp providers validate

Template Operations¶

# List templates by provider
ohfp templates list --provider-type aws
ohfp templates list --provider-name aws-us-east-1

# Show template source information
ohfp templates show web-server --source-info

# Validate template against provider
ohfp templates validate web-server --provider aws-us-east-1

Error Handling and Validation¶

Provider Selection Errors¶

No enabled providers: When no providers are available
Provider not found: When explicit provider doesn't exist
Provider disabled: When selected provider is disabled
No compatible providers: When no providers support required API

Template Validation Errors¶

API not supported: Provider doesn't support required API
Instance limit exceeded: Request exceeds provider limits
Pricing model mismatch: Provider doesn't support pricing model
Fleet type incompatible: Provider doesn't support fleet type

Error Response Format¶

{
  "error": {
    "code": "PROVIDER_NOT_FOUND",
    "message": "Provider instance 'aws-invalid' not found in configuration",
    "details": {
      "requested_provider": "aws-invalid",
      "available_providers": ["aws-us-east-1", "aws-us-west-2"]
    }
  }
}

Performance Considerations¶

Template Caching¶

Templates are cached in memory with file modification time tracking
Cache is automatically refreshed when template files change
Manual cache refresh available via API and CLI

Provider Selection Optimization¶

Provider configurations are cached at startup
Selection algorithms use pre-computed weights and priorities
Capability validation results are cached per provider-API combination

File I/O Optimization¶

Template files are loaded once and cached
Only modified files are reloaded
Batch operations minimize file system calls

Monitoring and Observability¶

Metrics¶

Provider selection distribution
Template validation success/failure rates
File loading performance
Cache hit/miss ratios

Logging¶

Provider selection decisions with reasoning
Template override chains
Validation failures with details
Performance timing information

Health Checks¶

Provider availability status
Template file accessibility
Configuration validation status
Cache consistency checks

Migration Guide¶

From Single Provider¶

Update configuration to include provider instances
Migrate templates to provider-specific files (optional)
Update API calls to include provider preferences (optional)
Test provider selection behavior

Template Migration¶

# Migrate existing templates to provider-specific files
ohfp templates migrate --from templates.json --to-provider aws-us-east-1

# Validate migrated templates
ohfp templates validate --all --provider aws-us-east-1

Best Practices¶

Configuration¶

Use meaningful provider instance names
Set appropriate weights for load balancing
Enable only necessary providers
Regular validation of provider configurations

Template Organization¶

Use provider-specific files for customizations
Keep common templates in main file
Document template inheritance chains
Regular cleanup of unused templates

Monitoring¶

Monitor provider selection distribution
Track validation failure patterns
Alert on provider availability issues
Regular performance reviews

Future Enhancements¶

Planned Features¶

Dynamic provider discovery
Cross-provider failover
Improved scheduling algorithms
Provider cost optimization
Multi-region template synchronization

Extension Points¶

Custom selection strategies
Provider-specific validation rules
Template transformation pipelines
External provider registries