Registries and Catalogs¶
The generator's registry system is built on top of the MCP server catalogs. RegistryLoader (in src/lib/) reads catalog JSON files at startup and produces three internal registries.
Registry Overview¶
| Registry | Source Catalog | Internal Shape |
|---|---|---|
| Framework Registry | model-servers.json |
{ backendName: { version: { baseImage, accelerator, envVars, ... } } } |
| Model Registry | models.json |
{ modelIdOrPattern: { family, chatTemplate, frameworkCompatibility, architecture, tasks, modelType, ... } } |
| Instance Accelerator Mapping | instances.json |
{ instanceType: { family, accelerator: { type, hardware, versions }, memory, vcpus } } |
Consumers¶
These registries are consumed by several modules:
| Module | What It Uses |
|---|---|
ConfigurationManager |
Matches user selections to deployment-config/model configs, merges env vars with five-layer precedence |
PromptRunner |
Populates instance type choices, backend version choices |
ValidationEngine |
Checks accelerator compatibility between backend requirements and instance capabilities |
SchemaValidationEngine |
Validates generated API payloads against AWS service models |
CrossCuttingChecker |
Validates consistency across payloads using instance catalog data |
CommentGenerator |
Generates Dockerfile comments from registry metadata |
Source of Truth¶
All catalogs live in the centralized shared directory servers/lib/catalogs/. Individual server directories no longer maintain their own catalogs/ subdirectories.
| Catalog File | Location | Purpose |
|---|---|---|
model-servers.json |
servers/lib/catalogs/ |
Base images, backend versions, AMI versions |
models.json |
servers/lib/catalogs/ |
Unified model catalog (merged from transformers + diffusors + model-sizes) |
instances.json |
servers/lib/catalogs/ |
Instance types, GPU counts, CUDA versions |
jumpstart-public.json |
servers/lib/catalogs/ |
JumpStart public model metadata |
python-slim.json |
servers/lib/catalogs/ |
Python slim base images |
triton.json |
servers/lib/catalogs/ |
Triton base images |
triton-backends.json |
servers/lib/catalogs/ |
Triton backend configurations |
regions.json |
servers/lib/catalogs/ |
AWS region availability |
Each catalog has a corresponding JSON schema in servers/lib/schemas/ that defines the required fields and value constraints.
Unified Model Catalog¶
The models.json catalog merges data from three former sources into a single file keyed by model identifier:
| Former Source | Fields Contributed |
|---|---|
model-sizes.json |
parameterCount, defaultDtype, maxPositionEmbeddings, recommendedQuantizations |
popular-transformers.json |
family, chatTemplate, gated, tags, frameworkCompatibility |
popular-diffusors.json |
family, pipeline, gated, tags, frameworkCompatibility |
Every entry has three mandatory fields:
architecture— HuggingFacearchitectures[0]value (e.g.,LlamaForCausalLM)tasks— inference tasks the model performs (e.g.,["text-generation"])modelType— one oftransformer,diffusor, orpredictor
The modelType field drives architecture-level routing: which deployment config to suggest, which base image to use, and whether GPU instances are needed.
Schema-Driven Validation¶
The schema-driven validation system validates generated AWS API payloads against actual AWS service model files (service-2.json). It catches enum violations, type mismatches, missing required fields, and cross-cutting consistency issues before deployment.
The validation system uses the instance catalog (instances.json) for cross-cutting checks like GPU count consistency, CUDA compatibility, and model type / instance alignment. See the Schema Validation section in Configuration for user-facing documentation.
Architecture Compatibility (supportedModelTypes)¶
Each entry in model-servers.json can include a supportedModelTypes array field that lists the lowercase model_type strings (from HuggingFace config.json) that the server version supports.
What It Contains¶
An array of lowercase model type identifiers. These correspond to the model_type field in a HuggingFace model's config.json (e.g., llama, qwen2, mistral, gpt2).
{
"vllm": [
{
"image": "vllm/vllm-openai:v0.6.3",
"labels": { "framework_version": "0.6.3" },
"supportedModelTypes": ["llama", "qwen2", "mistral", "gemma", "phi3", "..."]
}
]
}
How It's Populated¶
The registry sync-architectures command fetches model registry source files from each server's GitHub repository at the tagged version, parses them to extract supported model types, and writes the result into the catalog entry.
The parsing logic lives in src/lib/architecture-sync.js and handles server-specific formats:
| Server | Source File | Parser |
|---|---|---|
| vLLM | vllm/model_executor/models/registry.py |
parseVllmRegistry |
| SGLang | python/sglang/srt/models/model_registry.py |
parseSglangRegistry |
| TensorRT-LLM | tensorrt_llm/models/__init__.py |
parseTensorRTRegistry |
How It's Used¶
The CrossCuttingChecker.checkModelArchitectureCompatibility() method (in src/lib/cross-cutting-checker.js) uses supportedModelTypes to validate that the user's model is compatible with their selected server version. This check runs:
- At generation time (advisory warning, does not block)
- During
do/validate(reported as a medium-confidence warning) - Via
registry check <model-id>(pre-generation compatibility check)
When Absent or Empty¶
The supportedModelTypes field is optional. When it's absent or an empty array, architecture compatibility validation is skipped gracefully — no warning is emitted and generation proceeds normally. This happens when:
registry sync-architectureshas not been run- The server entry doesn't have a matching source configuration
- The fetch for a specific version failed (network error, tag not found)
Contributing Data¶
To add or update registry data, edit the source catalog in servers/lib/catalogs/ and validate:
# Edit the catalog file directly
# Then validate against the schema
node scripts/validate-catalogs.js
# Validate catalog enum values against AWS service models (requires schema sync)
npm run validate:catalogs
For detailed instructions on adding instance types, base images, or model entries, see MCP Server Development -- Adding a Catalog Entry.
How RegistryLoader Transforms Catalogs¶
RegistryLoader is the adapter layer between the raw catalog JSON and the generator's internal data model. It performs these transformations:
Framework Registry (loadFrameworkRegistry): Reads model-servers.json, which stores image entries as arrays keyed by backend name (e.g. vllm, sglang, triton-vllm). Each entry with a labels.framework_version field becomes a version entry in the registry. Fields like image, accelerator, defaults.envVars, defaults.inferenceAmiVersion, validationLevel, and profiles are mapped to the internal FrameworkConfig shape.
Model Registry (loadModelRegistry): Reads popular-transformers.json and popular-diffusors.json and merges them into a single registry. Each entry includes family, chatTemplate, requiresTemplate, validationLevel, frameworkCompatibility, profiles, and notes. Pattern keys like meta-llama/Llama-2-* are preserved for glob matching.
Instance Accelerator Mapping (loadInstanceAcceleratorMapping): Reads instances.json and maps flat catalog fields (acceleratorType, hardware, gpuArchitecture, cudaVersions, defaultCudaVersion) into the nested accelerator object shape expected by ValidationEngine.