Template System¶
Templates live in templates/ and use EJS syntax. The generator renders them during the writing phase, with all template variables flowing from user configuration through src/app.js.
Directory Structure¶
templates/
├── Dockerfile # Main Dockerfile (HTTP + Transformers architectures)
├── IAM_PERMISSIONS.md # IAM permissions reference (rendered per-project)
├── MIGRATION.md # Migration notes for upgrading MCC versions
├── PROJECT_README.md # Becomes README.md in generated project
├── TEMPLATE_SYSTEM.md # In-project template documentation
├── buildspec.yml # CodeBuild spec for remote builds
├── deploy_notebook_generator.py # SageMaker AI notebook exporter
├── nginx-diffusors.conf # Nginx config for diffusors architecture
├── nginx-predictors.conf # Nginx config for HTTP architecture
├── nginx-tensorrt.conf # Nginx config for TensorRT-LLM
├── requirements.txt # Python dependencies (HTTP architecture)
├── code/ # Serving code (model_handler.py, serve.py, flask/)
├── deploy/ # Deployment scripts
├── diffusors/ # Diffusors-specific Dockerfile + serve scripts
├── do/ # Lifecycle scripts (build, deploy, test, tune, etc.)
├── hyperpod/ # HyperPod EKS manifests + scripts
├── marketplace/ # Marketplace-specific config/deploy/test overlays
├── sample_model/ # Sample sklearn/xgboost/tensorflow model training
├── test/ # Test scripts
└── triton/ # Triton-specific Dockerfile + model repository
EJS Syntax¶
<%= variable %> <%# Escaped output (HTML entities) %>
<%- variable %> <%# Raw output (no escaping — use in shell scripts) %>
<% if (condition) { %> <%# Control flow %>
<% } %>
<%# This is a comment — not rendered %>
Templates have access to the full templateVars object, which includes all user answers plus computed values like comments, orderedEnvVars, and serverEnvVars.
How Template Variables Flow¶
CLI flags / env vars / config file
│
▼
src/lib/config-manager.js → loads + merges all config sources
│
▼
src/lib/prompt-runner.js → runs interactive prompts (fills gaps)
│
▼
configManager.getFinalConfiguration() → merged answers object
│
▼
src/lib/template-variable-resolver.js → ensures defaults, merges env vars from catalogs
│
▼
src/app.js writeProject() → adds comments, orderedEnvVars, serverEnvVars
│
▼
src/copy-tpl.js → renders all EJS templates with templateVars
The precedence for environment variables (lowest → highest):
- Catalog defaults (
Image_Entry.defaults.envVars) - Framework profile (
Image_Entry.profiles[selectedProfile].envVars) - Model entry (
model catalog entry envVars) - Model profile (
model catalog entry profiles[selectedProfile].envVars) - CLI overrides (user's
--envflags or config file)
The Dockerfile Template¶
The main templates/Dockerfile (332 lines) has a top-level branch on framework !== 'transformers':
<% if (framework !== 'transformers') { %>
<%# HTTP architecture — python:3.12-slim, Nginx, model_handler.py %>
FROM <%= baseImage || 'public.ecr.aws/docker/library/python:3.12-slim' %>
...
<% } else { %>
<%# Transformers architecture — framework-specific base images %>
<% if (modelServer === 'vllm') { %>
ARG BASE_IMAGE=<%= baseImage || 'vllm/vllm-openai:v0.10.1' %>
<% } else if (modelServer === 'sglang') { %>
ARG BASE_IMAGE=<%= baseImage || 'lmsysorg/sglang:v0.5.4.post1' %>
<% } else if (modelServer === 'tensorrt-llm') { %>
...
<% } %>
<% } %>
Architecture-specific Dockerfiles live in their own directories:
templates/triton/Dockerfile— for Triton architecturetemplates/diffusors/Dockerfile— for Diffusors architecture
These are rendered and overlaid during the writing phase (not via the main Dockerfile).
Runtime Branching in do/ Scripts¶
The do/ scripts in templates/do/ use three branching strategies:
1. EJS Conditionals (Generation-Time)¶
Some scripts have EJS blocks that emit different code based on deploymentTarget, deploymentConfig, or architecture:
<%# In templates/do/config %>
<% if (deploymentTarget === 'realtime-inference') { %>
export ENDPOINT_INITIAL_INSTANCE_COUNT=<%= endpointInitialInstanceCount || 1 %>
<% } else if (deploymentTarget === 'async-inference') { %>
export ASYNC_S3_OUTPUT_PATH="<%= asyncS3OutputPath %>"
<% } %>
2. Shell case Statements (Runtime)¶
Other scripts use runtime branching on DEPLOYMENT_CONFIG (sourced from do/config). This means the same generated script works for multiple backends:
case "${DEPLOYMENT_CONFIG}" in
transformers-tensorrt-llm)
# NGC authentication, then docker build
;;
transformers-vllm|transformers-sglang)
# Standard GPU image build
;;
transformers-lmi|transformers-djl)
# ECR authentication for DLC base images
;;
http-flask|http-fastapi)
# CPU image build
;;
esac
3. EJS Partials (Include Files)¶
Complex scripts use EJS partials in .d/ directories:
templates/do/
├── deploy
├── deploy.d/
│ ├── async-inference.ejs
│ ├── batch-transform.ejs
│ ├── hyperpod-eks.ejs
│ └── managed-inference.ejs
├── clean
└── clean.d/
├── async-inference.ejs
├── batch-transform.ejs
├── hyperpod-eks.ejs
└── managed-inference.ejs
The main do/deploy template includes the appropriate partial:
<%- include('deploy.d/' + (deploymentTarget === 'realtime-inference' ? 'managed-inference' : deploymentTarget)) %>
Legacy partial naming
The partials for realtime inference are named managed-inference.ejs (a legacy artifact from before the realtime-inference rename). The include expression handles this mapping. New partials should use the current deploymentTarget value as their filename.
Partials are NOT copied to the output (they're in ignorePatterns). They're resolved at generation time.
Architecture-Specific File Routing¶
The writeProject() function in src/app.js handles five architectures. After copying all non-ignored templates, it deletes files that don't belong:
| Architecture | Keeps | Deletes | Special |
|---|---|---|---|
http |
model_handler.py, serve.py, flask/, nginx-predictors.conf | code/serve, serving.properties, chat_template.jinja, start_server.sh | Deletes flask/ if backend is fastapi |
transformers |
code/serve, serving.properties, chat_template.jinja | model_handler.py, serve.py, start_server.py, flask/, nginx-predictors.conf | — |
triton |
— | All http + transformers code files | Overlays triton/Dockerfile, generates model_repository/ |
diffusors |
— | All http + transformers code files | Overlays diffusors/Dockerfile, serve, start_server.sh, patch_image_api.py |
marketplace |
— | Almost everything (no container) | Overlays marketplace/config, deploy, test |
Conditional File Exclusion (Ignore Patterns)¶
Before file routing, the generator excludes entire file trees based on configuration:
| Condition | Excluded |
|---|---|
deploymentTarget !== 'hyperpod-eks' |
hyperpod/** |
deploymentTarget === 'hyperpod-eks' |
do/lib/**, do/ic/**, do/add-ic, do/status, do/optimize |
deploymentTarget === 'async-inference' or batch-transform |
do/ic/**, do/add-ic, do/status |
!enableLora |
do/adapter, do/adapters/**, code/adapter_sidecar.py |
!includeBenchmark |
do/benchmark, do/optimize |
Architecture is not transformers |
do/tune, do/.tune_helper.py |
deploymentTarget === 'batch-transform' |
do/train, do/training/** |
!includeSampleModel or transformers/diffusors |
sample_model/** |
Test type hosted-model-endpoint not selected |
do/test |
Key Template Variables¶
These variables are available in all templates via the templateVars context:
Core Configuration¶
| Variable | Type | Description |
|---|---|---|
projectName |
string | Project name (e.g., my-llm) |
deploymentConfig |
string | Full config string (e.g., transformers-vllm) |
architecture |
string | http, transformers, triton, diffusors, marketplace |
backend |
string | Backend server (e.g., vllm, flask, fil) |
framework |
string | Derived framework (transformers, http) |
modelServer |
string | Alias for backend |
modelName |
string | HuggingFace model ID or S3 path |
modelFormat |
string | pkl, joblib, h5, SavedModel, etc. |
modelSource |
string | huggingface, s3, registry |
deploymentTarget |
string | realtime-inference, async-inference, batch-transform, hyperpod-eks |
instanceType |
string | SageMaker AI instance type |
region |
string | AWS region |
buildTarget |
string | local, codebuild |
Feature Flags¶
| Variable | Type | Description |
|---|---|---|
enableLora |
boolean | Whether LoRA adapter support is enabled |
includeSampleModel |
boolean | Whether to include sample training data |
includeBenchmark |
boolean | Whether to include benchmarking scripts |
includeTesting |
boolean | Whether to include test scripts |
Infrastructure¶
| Variable | Type | Description |
|---|---|---|
baseImage |
string | Docker base image |
roleArn |
string | SageMaker AI execution role ARN |
buildTimestamp |
string | ISO timestamp of generation |
hyperPodCluster |
string | HyperPod cluster name (if applicable) |
hyperPodNamespace |
string | HyperPod namespace (default: default) |
hyperPodReplicas |
number | HyperPod replica count (default: 1) |
LoRA¶
| Variable | Type | Description |
|---|---|---|
maxLoras |
number | Maximum concurrent LoRA adapters (default: 30) |
maxLoraRank |
number | Maximum LoRA rank (default: 64) |
Environment Variables¶
| Variable | Type | Description |
|---|---|---|
envVars |
object | Raw env var key-value pairs |
orderedEnvVars |
array | [{key, value}] — ordered for Dockerfile ENV lines |
modelEnvVars |
object | Model-specific env vars from catalogs |
serverEnvVars |
object | Engine-prefixed server env vars |
comments |
object | Generated Dockerfile comments (accelerator info, env var explanations) |
Adding a New Deployment Configuration¶
A deployment configuration is a string like transformers-vllm or triton-fil that bundles an architecture and a backend. There are currently 16 canonical configs (2 HTTP + 5 Transformers + 7 Triton + 1 Diffusors + 1 Marketplace).
To add one, touch these files:
1. Deployment config resolver (src/lib/deployment-config-resolver.js)¶
Add the canonical mapping in the CANONICAL_CONFIGS Map:
2. Validation (src/lib/template-manager.js)¶
Add the value to the deploymentConfigs array in validate(). If the backend requires a GPU, add it to GPU_REQUIRING_BACKENDS:
const GPU_REQUIRING_BACKENDS = ['triton-vllm', 'triton-tensorrtllm', 'diffusors-vllm-omni', 'transformers-myserver'];
3. Prompt definition (src/lib/prompts/infrastructure-prompts.js)¶
Add a choice to the deployment config prompt:
{
name: 'Transformers with MyServer',
value: 'transformers-myserver',
short: 'transformers-myserver'
}
4. Dockerfile template (templates/Dockerfile)¶
In the else (transformers) branch, add an else if for the new backend:
<% } else if (modelServer === 'myserver') { %>
ARG BASE_IMAGE=<%= baseImage || 'myserver/myserver:latest' %>
FROM ${BASE_IMAGE}
...
<% } %>
5. do/ script case statements¶
Add cases to templates/do/build, templates/do/deploy, templates/do/test, and templates/do/clean for any backend-specific behavior (authentication, build flags, test payloads).
6. Base image catalog entry¶
Add an image entry to servers/base-image-picker/catalogs/model-servers.json so the MCP server can recommend the correct base image. See MCP Server Development for the catalog schema.
7. Parameter schema (if new params needed)¶
If the new backend introduces unique parameters, add them to config/parameter-schema-v2.json with appliesTo scoped to the new config. See Schema-Driven Architecture.
8. Tests¶
Add test cases:
test/input-parsing-and-generation/— generate a project with the new config, assert expected filestest/property/— if the config introduces validation rules- Property tests for deployment-config-resolver decomposition
See Testing for the test framework details.
Tips¶
- Preview rendering: Use
npx ejs templates/Dockerfile -d '{"framework":"transformers","modelServer":"vllm",...}'to preview a template without running the full generator. - Debug variables: Add
<%- JSON.stringify(answers, null, 2) %>temporarily to any template to dump the full context. - Shell escaping: In
do/scripts, use<%-(raw output) not<%=(HTML-escaped) — otherwise<,>, and&get mangled.