Storage Strategies¶
The Open Host Factory Plugin implements a sophisticated storage architecture with multiple strategies, generic repository patterns, and built-in migration capabilities. This guide covers all storage options and their advanced features.
Overview¶
The storage system is built on scalable and robust patterns:
- 3 Complete Storage Strategies: JSON, SQL, DynamoDB
- Generic Repository Pattern: Same interface across all storage types
- Built-in Migration System: Seamless migration between storage strategies
- Transaction Management: ACID compliance with rollback capabilities
- Concurrency Control: Optimistic locking and conflict resolution
- Storage Components: Modular architecture with pluggable components
Storage Strategy Architecture¶
graph TB
subgraph "Application Layer"
Repo[Generic Repository]
Migration[Migration Tool]
end
subgraph "Storage Strategies"
JSON[JSON Strategy]
SQL[SQL Strategy]
DynamoDB[DynamoDB Strategy]
end
subgraph "Storage Components"
FileManager[File Manager]
Serializer[JSON Serializer]
LockManager[Lock Manager]
TransactionManager[Transaction Manager]
end
subgraph "Physical Storage"
JSONFiles[JSON Files]
SQLiteDB[SQLite Database]
PostgreSQL[PostgreSQL]
DynamoDBTables[DynamoDB Tables]
end
Repo --> JSON
Repo --> SQL
Repo --> DynamoDB
Migration --> JSON
Migration --> SQL
Migration --> DynamoDB
JSON --> FileManager
JSON --> Serializer
JSON --> LockManager
JSON --> TransactionManager
JSON --> JSONFiles
SQL --> SQLiteDB
SQL --> PostgreSQL
DynamoDB --> DynamoDBTables
JSON Storage Strategy¶
Single File Storage¶
The default JSON storage uses a single file containing all entities:
Configuration¶
{
"storage": {
"strategy": "json",
"json_strategy": {
"storage_type": "single_file",
"base_path": "data",
"filenames": {
"single_file": "request_database.json"
}
}
}
}
File Structure¶
{
"templates": {
"template-1": {
"template_id": "template-1",
"name": "Standard Template",
"provider_api": "RunInstances",
"vm_type": "t3.medium",
"max_number": 10
}
},
"requests": {
"req-123": {
"request_id": "req-123",
"template_id": "template-1",
"machine_count": 3,
"status": "COMPLETED"
}
},
"machines": {
"machine-456": {
"machine_id": "machine-456",
"request_id": "req-123",
"status": "RUNNING",
"private_ip": "10.0.1.100"
}
}
}
Multi-File Storage (Split Files)¶
For better organization and performance, entities can be split across multiple files:
Configuration¶
{
"storage": {
"strategy": "json",
"json_strategy": {
"storage_type": "split_files",
"base_path": "data",
"filenames": {
"split_files": {
"templates": "templates.json",
"requests": "requests.json",
"machines": "machines.json"
}
}
}
}
}
File Structure¶
data/
+--- templates.json # Template definitions
+--- requests.json # Request records
+--- machines.json # Machine instances
Individual File Content¶
templates.json:
{
"template-1": {
"template_id": "template-1",
"name": "Standard Template",
"provider_api": "ec2",
"vm_type": "t3.medium"
}
}
requests.json:
{
"req-123": {
"request_id": "req-123",
"template_id": "template-1",
"machine_count": 3,
"status": "COMPLETED"
}
}
JSON Storage Features¶
Atomic Operations¶
# JSON storage provides atomic file operations
# Each write operation is atomic at the file level
from src.infrastructure.persistence.json.strategy import JSONStorageStrategy
json_storage = JSONStorageStrategy(config)
# Atomic operations are handled internally
json_storage.save_entity("requests", request_data)
# File is written atomically with backup creation
Concurrent Access Control¶
# Optimistic locking prevents conflicts
try:
storage.update_entity("req-123", updated_data, expected_version=5)
except ConcurrencyConflictError:
# Handle version conflict
current_data = storage.get_entity("req-123")
# Merge changes and retry
Transaction Management¶
# Transaction support with rollback
with storage.begin_transaction() as tx:
tx.save_entity("requests", request_data)
tx.save_entity("machines", machine_data)
# Automatically commits or rolls back
SQL Storage Strategy¶
SQLite Configuration¶
For local development and small deployments:
{
"storage": {
"strategy": "sql",
"sql_strategy": {
"type": "sqlite",
"name": "request_database.db",
"host": "",
"port": 0,
"pool_size": 5,
"max_overflow": 10,
"timeout": 30
}
}
}
PostgreSQL Configuration¶
For production deployments:
{
"storage": {
"strategy": "sql",
"sql_strategy": {
"type": "postgresql",
"host": "localhost",
"port": 5432,
"name": "hostfactory",
"username": "hostfactory_user",
"password": "${DB_PASSWORD}",
"pool_size": 20,
"max_overflow": 30,
"timeout": 60
}
}
}
Database Schema¶
The SQL strategy automatically creates these tables:
Templates Table¶
CREATE TABLE templates (
template_id VARCHAR(255) PRIMARY KEY,
name VARCHAR(255) NOT NULL,
provider_api VARCHAR(50) NOT NULL,
vm_type VARCHAR(50) NOT NULL,
max_number INTEGER NOT NULL,
image_id VARCHAR(255),
subnet_ids TEXT,
security_group_ids TEXT,
attributes TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Requests Table¶
CREATE TABLE requests (
request_id VARCHAR(255) PRIMARY KEY,
template_id VARCHAR(255) NOT NULL,
machine_count INTEGER NOT NULL,
status VARCHAR(50) NOT NULL,
request_type VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP,
tags TEXT,
metadata TEXT,
FOREIGN KEY (template_id) REFERENCES templates(template_id)
);
Machines Table¶
CREATE TABLE machines (
machine_id VARCHAR(255) PRIMARY KEY,
request_id VARCHAR(255) NOT NULL,
template_id VARCHAR(255) NOT NULL,
status VARCHAR(50) NOT NULL,
instance_type VARCHAR(50),
provider_instance_id VARCHAR(255),
private_ip VARCHAR(45),
public_ip VARCHAR(45),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
started_at TIMESTAMP,
terminated_at TIMESTAMP,
FOREIGN KEY (request_id) REFERENCES requests(request_id),
FOREIGN KEY (template_id) REFERENCES templates(template_id)
);
SQL Storage Features¶
Connection Pooling¶
# Automatic connection pool management
sql_strategy = SQLStorageStrategy(config)
# Pool size: 20 connections
# Max overflow: 30 additional connections
# Automatic connection recycling
Query Optimization¶
# Optimized queries with indexes
requests = storage.query_entities(
"requests",
filters={"status": "PENDING"},
order_by="created_at",
limit=50
)
Transaction Support¶
# Transaction support varies by storage strategy
# JSON: File-level atomic operations
# SQL: Full ACID transaction support
# DynamoDB: Limited transaction support
# SQL storage transaction example
from src.infrastructure.persistence.sql.strategy import SQLStorageStrategy
sql_storage = SQLStorageStrategy(config)
# SQL transactions are handled internally by the storage strategy
# when performing batch operations
DynamoDB Storage Strategy¶
Configuration¶
For AWS cloud deployments:
{
"storage": {
"strategy": "dynamodb",
"dynamodb_strategy": {
"region": "us-east-1",
"profile": "production",
"table_prefix": "hostfactory",
"read_capacity": 5,
"write_capacity": 5,
"billing_mode": "PAY_PER_REQUEST"
}
}
}
Table Structure¶
DynamoDB strategy creates these tables:
Templates Table¶
- Table Name:
hostfactory_templates
- Partition Key:
template_id
(String) - Attributes: All template fields as DynamoDB attributes
Requests Table¶
- Table Name:
hostfactory_requests
- Partition Key:
request_id
(String) - Global Secondary Index:
template_id-created_at-index
- Attributes: All request fields with appropriate typing
Machines Table¶
- Table Name:
hostfactory_machines
- Partition Key:
machine_id
(String) - Global Secondary Index:
request_id-created_at-index
- Attributes: All machine fields with DynamoDB types
DynamoDB Features¶
Auto-Scaling¶
{
"dynamodb_strategy": {
"auto_scaling": {
"enabled": true,
"min_read_capacity": 1,
"max_read_capacity": 100,
"min_write_capacity": 1,
"max_write_capacity": 100,
"target_utilization": 70
}
}
}
Global Secondary Indexes¶
# Query by template_id
requests = storage.query_by_gsi(
"requests",
index_name="template_id-created_at-index",
key_condition="template_id = :template_id",
expression_values={":template_id": "template-1"}
)
Conditional Updates¶
# Prevent concurrent modifications
storage.update_entity(
"requests",
request_id,
updated_data,
condition_expression="attribute_exists(request_id) AND version = :expected_version",
expression_values={":expected_version": current_version}
)
Generic Repository Pattern¶
Repository Interface¶
All storage strategies implement the same interface:
class RepositoryInterface:
"""Generic repository interface for all storage strategies."""
def save_entity(self, collection: str, entity_data: Dict[str, Any]) -> None:
"""Save an entity to storage."""
pass
def get_entity(self, collection: str, entity_id: str) -> Optional[Dict[str, Any]]:
"""Retrieve an entity by ID."""
pass
def update_entity(self, collection: str, entity_id: str,
entity_data: Dict[str, Any]) -> None:
"""Update an existing entity."""
pass
def delete_entity(self, collection: str, entity_id: str) -> None:
"""Delete an entity."""
pass
def query_entities(self, collection: str,
filters: Dict[str, Any] = None,
order_by: str = None,
limit: int = None) -> List[Dict[str, Any]]:
"""Query entities with filters."""
pass
Usage Example¶
The same code works with any storage strategy:
# Works with JSON, SQL, or DynamoDB
def create_request(repository: RepositoryInterface, request_data: Dict[str, Any]) -> str:
"""Create a request using any storage strategy."""
# Validate template exists
template = repository.get_entity("templates", request_data["template_id"])
if not template:
raise TemplateNotFoundError("Template not found")
# Save request
repository.save_entity("requests", request_data)
# Create machines
for i in range(request_data["machine_count"]):
machine_data = {
"machine_id": f"machine-{uuid.uuid4()}",
"request_id": request_data["request_id"],
"template_id": request_data["template_id"],
"status": "PENDING"
}
repository.save_entity("machines", machine_data)
return request_data["request_id"]
Repository Migration System¶
Migration Command¶
Built-in migration between storage strategies:
# Migrate from JSON to SQL
python run.py migrateRepository \
--source-type json \
--target-type sql \
--batch-size 100
# Migrate from SQL to DynamoDB
python run.py migrateRepository \
--source-type sql \
--target-type dynamodb \
--batch-size 50
Migration Process¶
1. Pre-Migration Validation¶
# Validate source and target configurations
migration_tool.validate_source_connection()
migration_tool.validate_target_connection()
migration_tool.validate_schema_compatibility()
2. Backup Creation¶
# Automatic backup before migration
backup_path = migration_tool.create_backup(
source_type="json",
backup_location="backups/migration_backup_20250630.json"
)
3. Batch Migration¶
# Migrate in configurable batches
for batch in migration_tool.get_migration_batches(batch_size=100):
# Migrate templates
templates = source_repo.query_entities("templates", limit=batch_size)
for template in templates:
target_repo.save_entity("templates", template)
# Migrate requests
requests = source_repo.query_entities("requests", limit=batch_size)
for request in requests:
target_repo.save_entity("requests", request)
# Migrate machines
machines = source_repo.query_entities("machines", limit=batch_size)
for machine in machines:
target_repo.save_entity("machines", machine)
4. Data Validation¶
# Verify migration completeness
migration_stats = migration_tool.validate_migration()
print(f"Migrated {migration_stats['templates']} templates")
print(f"Migrated {migration_stats['requests']} requests")
print(f"Migrated {migration_stats['machines']} machines")
Migration Configuration¶
{
"migration": {
"batch_size": 100,
"create_backup": true,
"backup_location": "backups/",
"validate_after_migration": true,
"cleanup_source": false,
"parallel_workers": 4
}
}
Transaction Management¶
Transaction Support by Strategy¶
Feature | JSON | SQLite | PostgreSQL | DynamoDB |
---|---|---|---|---|
ACID Transactions | [[]] | [[]] | [[]] | [!] Limited |
Rollback Support | [[]] | [[]] | [[]] | [!] Limited |
Isolation Levels | Basic | Full | Full | Eventual |
Concurrent Access | File Lock | DB Lock | DB Lock | Optimistic |
Transaction Usage¶
JSON Strategy¶
with json_storage.begin_transaction() as tx:
tx.save_entity("requests", request_data)
tx.save_entity("machines", machine_data)
# File is locked during transaction
# Automatic rollback on exception
SQL Strategy¶
with sql_storage.begin_transaction() as tx:
tx.save_entity("requests", request_data)
tx.update_entity("templates", template_id, updated_template)
# Full ACID compliance
# Automatic commit/rollback
DynamoDB Strategy¶
# DynamoDB transactions (limited to 25 items)
with dynamodb_storage.begin_transaction() as tx:
tx.save_entity("requests", request_data)
tx.conditional_update("machines", machine_id, machine_data)
# Conditional operations for consistency
Concurrency Control¶
Optimistic Locking¶
All strategies support optimistic locking:
# Version-based concurrency control
try:
storage.update_entity(
"requests",
request_id,
updated_data,
expected_version=current_version
)
except ConcurrencyConflictError as e:
# Handle version conflict
latest_data = storage.get_entity("requests", request_id)
merged_data = merge_changes(latest_data, updated_data)
storage.update_entity("requests", request_id, merged_data)
Lock Management¶
File-Based Locking (JSON)¶
# Reader-writer locks for JSON files
with storage.acquire_read_lock():
data = storage.read_file()
with storage.acquire_write_lock():
storage.write_file(data)
Database Locking (SQL)¶
# Database-level locking
storage.execute_with_lock(
"SELECT * FROM requests WHERE id = ? FOR UPDATE",
[request_id]
)
Performance Considerations¶
Storage Strategy Performance¶
Aspect | JSON | SQLite | PostgreSQL | DynamoDB |
---|---|---|---|---|
Read Performance | Good | Good | Excellent | Excellent |
Write Performance | Good | Good | Excellent | Excellent |
Concurrent Access | Limited | Good | Excellent | Excellent |
Scalability | Limited | Medium | High | Very High |
Setup Complexity | Low | Low | Medium | Medium |
Optimization Strategies¶
JSON Storage¶
{
"json_strategy": {
"enable_compression": true,
"cache_size": 1000,
"write_buffer_size": 64000,
"sync_writes": false
}
}
SQL Storage¶
{
"sql_strategy": {
"pool_size": 20,
"max_overflow": 30,
"pool_timeout": 30,
"pool_recycle": 3600,
"enable_query_cache": true
}
}
DynamoDB Storage¶
{
"dynamodb_strategy": {
"read_capacity": 10,
"write_capacity": 10,
"auto_scaling": true,
"enable_dax": true,
"consistent_reads": false
}
}
Best Practices¶
Choosing a Storage Strategy¶
Use JSON When:¶
- Small to medium datasets (< 10,000 records)
- Simple deployment requirements
- Development and testing environments
- Single-instance deployments
Use SQLite When:¶
- Medium datasets (< 100,000 records)
- Need ACID transactions
- Single-instance production deployments
- Local development with SQL features
Use PostgreSQL When:¶
- Large datasets (> 100,000 records)
- High concurrency requirements
- Multi-instance deployments
- Complex queries and reporting
Use DynamoDB When:¶
- Very large datasets (millions of records)
- High availability requirements
- AWS cloud deployments
- Global distribution needs
Migration Strategy¶
- Start with JSON for development
- Migrate to SQLite for single-instance production
- Migrate to PostgreSQL for high-concurrency production
- Migrate to DynamoDB for cloud-scale deployments
Backup Strategy¶
# JSON backup
cp data/request_database.json backups/backup_$(date +%Y%m%d).json
# SQL backup
pg_dump hostfactory > backups/backup_$(date +%Y%m%d).sql
# DynamoDB backup
aws dynamodb create-backup --table-name hostfactory_requests --backup-name backup_$(date +%Y%m%d)
Troubleshooting¶
Common Issues¶
JSON Storage Issues¶
# Check file permissions
ls -la data/request_database.json
# Validate JSON format
python -m json.tool data/request_database.json
# Check disk space
df -h data/
SQL Storage Issues¶
# Test connection
python -c "
from src.infrastructure.persistence.sql.strategy import SQLStorageStrategy
storage = SQLStorageStrategy(config)
storage.test_connection()
"
# Check database size
du -sh request_database.db
DynamoDB Storage Issues¶
# Check table status
aws dynamodb describe-table --table-name hostfactory_requests
# Check capacity utilization
aws dynamodb describe-table --table-name hostfactory_requests | jq '.Table.BillingModeSummary'
Next Steps¶
- Configuration Reference: Complete configuration options
- Migration Procedures: Detailed migration procedures
- Performance Tuning: Optimization strategies
- Backup & Recovery: Backup and recovery procedures