Deployment
Prerequisites
- Set up or have access to an AWS account.
- Ensure that your AWS account has the appropriate permissions. Resource creation during the AWS CDK deployment expects Administrator or Administrator-like permissions, to include resource creation and mutation permissions. Installation will not succeed if this profile does not have permissions to create and edit arbitrary resources for the system. This level of permissions is not required for the runtime of LISA. This is only necessary for deployment and subsequent updates.
- If using the chat UI, have your Identity Provider (IdP) information available.
- If using an existing VPC, have its information available.
- Familiarity with AWS Cloud Development Kit (CDK) and infrastructure-as-code principles is a plus.
- AWS CDK and Model Management both leverage AWS Systems Manager Agent (SSM) parameter store. Confirm that SSM is approved for use by your organization before beginning. If you're new to CDK, review the AWS CDK Documentation and consult with your AWS support team.
Software
- AWS CLI installed and configured
- Python 3.13
- Node.js 24
- Docker installed and running
- Sufficient disk space for model downloads and conversions
TIP:
To minimize version conflicts and ensure a consistent deployment environment, we recommend executing the following steps on a dedicated EC2 instance. However, LISA can be deployed from any machine that meets the prerequisites listed above.
Deployment Steps
LISA uses npm scripts for build and deployment. Key commands:
| Task | Command |
|---|---|
| Install Python & TypeScript deps | npm run install:python then npm install |
| Stage model weights | npm run model:check |
| Bootstrap CDK | npm run bootstrap |
| Deploy (full pipeline) | npm run deploy |
| Build archive (ADC pre-build) | npm run build:archive |
| List CDK stacks | npm run cdk:list |
The npm run deploy script runs the full pipeline: install dependencies, Docker checks, ECR login, model verification, build, and CDK deploy. Use STACK=<stack-name> npm run deploy to deploy specific stacks.
Step 1: Clone the Repository
Ensure you're working with the latest stable release of LISA:
git clone -b main --single-branch <path-to-lisa-repo>
cd lisaStep 2a: Create/Configure config-custom.yaml
Run the command below to copy the example configuration into config-custom.yaml. This will create the file if it doesn't exist already.
cp example_config.yaml config-custom.yamlReview the config-custom.yaml settings. Some settings will be configured later in this guide.
Step 2b: Set Up Environment Variables
Set the following environment variables:
export PROFILE=my-aws-profile # Optional, can be left blank
export DEPLOYMENT_NAME=my-deployment
export ENV=dev # Options: dev, test, or prod
export CDK_DOCKER=finch # Optional, only required if not using docker as container engineStep 3: Set Up Python and TypeScript Environments
- NOTE The code block below has two tabs for Debian & EL/AL2 Install system dependencies and set up both Python and TypeScript environments using the project's npm scripts:
- NOTE The code block below has two tabs for Debian & EL/AL2
# Install system dependencies
sudo apt-get update
sudo apt-get install -y jq
# Install Python packages (for model staging)
pip3 install --user --upgrade pip
pip3 install yq huggingface_hub s5cmd
# Create and activate Python virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install Python and TypeScript dependencies via npm scripts
npm run install:python
npm installStep 4: Configure LISA
Edit the config-custom.yaml file to customize your LISA deployment. Key configurations include:
- AWS account and region settings
- Authentication settings
- Model bucket name
Step 5: Configure Identity Provider
In the config-custom.yaml file, configure the authConfig block for authentication. LISA supports OpenID Connect (OIDC) providers such as AWS Cognito or Keycloak. Required fields include:
authority: URL of your identity providerclientId: Client ID for your applicationadminGroup: Group name for users with model management permissionsuserGroup: Group name for regular LISA usersjwtGroupsProperty: Path to the groups field in the JWT tokenadditionalScopes(optional): Extra scopes for group membership information
IDP Configuration examples using AWS Cognito and Keycloak can be found: IDP Configuration Examples
Step 6: Configure LiteLLM
We utilize LiteLLM under the hood to allow LISA to respond to the OpenAI specification. For LiteLLM configuration, a key must be set up so that the system may communicate with a database for tracking all the models that are added or removed using the Model Management API. The key must start with sk- and then can be any arbitrary string. We recommend generating a new UUID and then using that as the key. Configuration example is below.
litellmConfig:
db_key: sk-00000000-0000-0000-0000-000000000000 # needed for db operations, create your own key # pragma: allowlist-secretIMPORTANT
To include prompt/response content in LiteLLM logs (published by the LISA Serve ECS task to CloudWatch via litellm.log), enable LiteLLM logging callbacks and message logging in config-custom.yaml.
- Add the following to
litellmConfig:
litellmConfig:
litellm_settings:
callbacks: ["otel"]
turn_off_message_logging: false
environment_variables:
OTEL_EXPORTER: console
callback_settings:
otel:
message_logging: true- Ensure you are aware of the privacy/compliance implications: this causes request/response content to be logged.
LiteLLM Proxy logging reference: https://docs.litellm.ai/docs/proxy/logging
IMPORTANT
API Gateway audit logging (strict opt-in): LISA can emit audit logs for API Gateway requests (who initiated the request, what action was taken, and a sanitized JSON body) only when enabled via auditLoggingConfig in config-custom.yaml.
Example (opt-in to specific API prefixes):
auditLoggingConfig:
enabled: true
auditAll: false
enabledPaths: ["/api-tokens", "/models", "/repository", "/session", "/configuration", "/prompt-templates", "/project", "/user-preferences", "/mcp", "/mcp-server", "/mcp-workbench", "/metrics", "/chat-assistant-stacks", "/bedrock-kb"]Example (auditAll):
auditLoggingConfig:
enabled: true
auditAll: trueOptional JSON body audit (default off): set includeJsonBody: true to emit AUDIT_API_GATEWAY_REQUEST_BODY for opted-in paths. When includeJsonBody is false or omitted, request bodies are never logged, even when path auditing is enabled.
When audit logging is enabled for a given API prefix, two kinds of events may appear. They are not in the same CloudWatch log group:
| Event | What it contains | Where it is logged |
|---|---|---|
AUDIT_API_GATEWAY_REQUEST | Allow/Deny, user identity, HTTP method + path (from the authorizer) | API Gateway Lambda authorizer log group (e.g. …-lambda-authorizer) |
AUDIT_API_GATEWAY_REQUEST_BODY | Sanitized JSON body (and user context from the proxy event) | The Lambda (or service) that handles the route — e.g. put_session for PUT /session/{id}, or the FastAPI/Mangum app log stream for APIs served that way — only if includeJsonBody: true |
API Gateway does not send the HTTP body to the REST authorizer, so body audit must run in the integration that receives event["body"].
Each audit line is logged as EVENT_TYPE followed by a compact JSON object (same fields as before), so the full payload appears in the log message and can be parsed in CloudWatch Logs Insights (e.g. split on the first space and parse the JSON).
Privacy note: enabling JSON body audit logging may include sensitive user data; ensure your organization’s compliance requirements are met.
Step 7a: Customize Model Deployment (If Using LISA Serve)
In the ecsModels section of config-custom.yaml, allow our deployment process to pull the model weights for you.
During the deployment process, LISA will optionally attempt to download your model weights if you specify an optional ecsModels array, this will only work in non ADC regions. Specifically, see the ecsModels section of the example_config.yaml file. Here we define the model name, inference container, and baseImage:
ecsModels:
- modelName: your-model-name
inferenceContainer: vllm
baseImage: vllm/vllm-openai:latestStep 7b: Stage Model Weights
LISA requires model weights to be staged in the S3 bucket specified in your config-custom.yaml file, assuming the S3 bucket follows this structure:
s3://<bucket-name>/<hf-model-id-1>
s3://<bucket-name>/<hf-model-id-1>/<file-1>
s3://<bucket-name>/<hf-model-id-1>/<file-2>
...
s3://<bucket-name>/<hf-model-id-2>Example:
s3://<bucket-name>/mistralai/Mistral-7B-Instruct-v0.2
s3://<bucket-name>/mistralai/Mistral-7B-Instruct-v0.2/<file-1>
s3://<bucket-name>/mistralai/Mistral-7B-Instruct-v0.2/<file-2>
...To automatically download and stage the model weights defined by the ecsModels parameter in your config-custom.yaml, use the following command:
npm run model:checkThis command verifies if the model's weights are already present in your S3 bucket. If not, it downloads the weights, converts them to the required format, and uploads them to your S3 bucket. Ensure adequate disk space is available for this process.
WARNING As of LISA 3.0, the
ecsModelsparameter inconfig-custom.yamlis solely for staging model weights in your S3 bucket. Previously, before models could be managed through the API or via the Model Management section of the Chatbot, this parameter also dictated which models were deployed. NOTE
For air-gapped systems, before running
npm run model:checkyou should manually download model artifacts and place them in amodelsdirectory at the project root, using the structure:models/<model-id>.
NOTE This process is primarily designed and tested for HuggingFace models. For other model formats, you will need to manually create and upload safetensors. NOTE Please valdiate that all files successfully downloaded locally AND were uploaded to the S3 Bucket. Ensure all large files such as .safetensor files exist.
Step 9: Bootstrap CDK (If Not Already Done)
If you haven't bootstrapped your AWS account for CDK:
npm run bootstrapADC Region Deployment Tips
Amazon Dedicated Cloud (ADC) regions are isolated AWS environments designed for government customers' most sensitive workloads. These regions have restricted internet access and limited external dependencies, requiring special deployment considerations for LISA.
There are two deployment approaches for ADC regions:
- Pre-built Resources (Recommended): Build all components in a commercial region, then transfer to ADC
- In-Region Building: Configure LISA to use ADC-accessible repositories for building components
Approach 1: Pre-built Resources (Recommended)
This approach builds all necessary components in a commercial region with full internet access, then transfers them to the ADC region.
Step 1: Build Components in Commercial Region
Set up LISA in a commercial AWS region with internet access
Build all components:
bashnpm run build:archive ./bin/build-assets --include-imagesThis generates:
- Lambda function zip files in
./dist/layers/*.zip(frombuild:archive)- Docker images exported as
./dist/images/*.tarfiles (frombuild-assets --include-images)
- Docker images exported as
Step 2: Transfer to ADC Region
Upload Docker images to ECR in your ADC region:
bash# Load and tag images docker load -i lisa-rest-api.tar docker tag lisa-rest-api:latest <adc-account-id>.dkr.ecr.<adc-region>.amazonaws.com/lisa-rest-api:latest # Push to ADC ECR aws ecr get-login-password --region <adc-region> | docker login --username AWS --password-stdin <adc-account-id>.dkr.ecr.<adc-region>.amazonaws.com docker push <adc-account-id>.dkr.ecr.<adc-region>.amazonaws.com/lisa-rest-api:latestYou'll want to repeat this for lisa-batch-ingestion, as well as any of the LISA base model hosting containers (lisa-vllm, lisa-tgi, lisa-tei)
Transfer built artifacts to ADC environment
Step 3: Configure LISA for Pre-built Resources
Update your config-custom.yaml in the ADC region:
# Lambda layers from pre-built archives
lambdaLayerAssets:
authorizerLayerPath: './dist/layers/LisaAuthLayer.zip'
commonLayerPath: './dist/layers/LisaCommonLayer.zip'
cdkLayerPath: './dist/layers/LisaCdkLayer.zip'
fastapiLayerPath: './dist/layers/LisaFastApiLayer.zip'
ragLayerPath: './dist/layers/LisaRag.zip'
sdkLayerPath: './dist/layers/LisaSdk.zip'
# Lambda functions
lambdaPath: './dist/layers/LisaLambda.zip'
# Pre-built web assets
webAppAssetsPath: './dist/lisa-web'
documentsPath: './dist/docs'
ecsModelDeployerPath: './dist/ecs_model_deployer'
mcpServerDeployerPath: './dist/mcp_server_model_deployer'
vectorStoreDeployerPath: './dist/vector_store_deployer'
# Container images from ECR
batchIngestionConfig:
type: external
code: <adc-account-id>.dkr.ecr.<adc-region>.amazonaws.com/lisa-batch-ingestion:latest
restApiConfig:
imageConfig:
type: external
code: <adc-account-id>.dkr.ecr.<adc-region>.amazonaws.com/lisa-rest-api:latestApproach 2: In-Region Building
This approach configures LISA to build components using repositories accessible from within the ADC region.
Prerequisites
ADC-accessible package repositories (PyPI mirror, npm registry, container registry)
ADC-accessible container registries
Network connectivity to required build dependencies
Configuration
Update your config-custom.yaml to point to ADC-accessible repositories:
# Configure pip to use ADC-accessible PyPI mirror
pypiConfig:
indexUrl: https://your-adc-pypi-mirror.com/simple
trustedHost: your-adc-pypi-mirror.com
# Configure npm to use ADC-accessible registry
npmConfig:
registry: https://your-adc-npm-registry.com
# Use ADC-accessible base images for LISA-Serve and Batch Ingestion
baseImage: <adc-registry>/python:3.13-slim
# Configure offline build dependencies for REST API (prisma-client-py dependencies)
restApiConfig:
buildConfig:
PRISMA_CACHE_DIR: "./PRISMA_CACHE" # Path relative to lib/serve/rest-api/
# Configure offline build dependencies for MCP Workbench (S6 Overlay and rclone)
mcpWorkbenchBuildConfig:
S6_OVERLAY_NOARCH_SOURCE: "./s6-overlay-noarch.tar.xz" # Path relative to lib/serve/mcp-workbench/
S6_OVERLAY_ARCH_SOURCE: "./s6-overlay-x86_64.tar.xz" # Path relative to lib/serve/mcp-workbench/
RCLONE_SOURCE: "./rclone-linux-amd64.zip" # Path relative to lib/serve/mcp-workbench/You'll also want any model hosting base containers available, e.g. vllm/vllm-openai:latest and ghcr.io/huggingface/text-embeddings-inference:latest
Preparing Offline Build Dependencies
For environments without internet access during Docker builds, you can pre-cache required dependencies:
REST API Prisma cache (required by prisma-client-py):
The prisma-client-py package requires platform-specific binaries and a Node.js environment to function. When Prisma runs for the first time, it downloads these dependencies to ~/.cache/prisma/ and ~/.cache/prisma-python/. For offline deployments, you need to pre-populate this cache.
Below is an example workflow using an Amazon Linux 2023 instance with Python 3.12:
# Ensure Pip is up-to-date
pip3 install --upgrade pip
# Install Prisma Python package
pip3 install prisma
# Trigger Prisma to download all required binaries and create its Node.js environment
# This populates ~/.cache/prisma/ and ~/.cache/prisma-python/
prisma version
# Copy the complete Prisma cache to your build context
# The wildcard captures both 'prisma' and 'prisma-python' directories
cp -r ~/.cache/prisma* lib/serve/rest-api/PRISMA_CACHE/Important Notes:
- The cache is platform-specific. Generate it on a system matching your Docker base image (e.g., for
public.ecr.aws/docker/library/python:3.13-slimwhich is Debian-based, so you may want to use a Debian-based system) - The
prisma versioncommand downloads binaries for your current platform - Both
prisma/andprisma-python/directories are required for offline operation
MCP Workbench dependencies (S6 Overlay and rclone):
# Download S6 Overlay files
cd lib/serve/mcp-workbench/
wget https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-noarch.tar.xz
wget https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-x86_64.tar.xz
# Download rclone
wget https://github.com/rclone/rclone/releases/download/v1.71.0/rclone-v1.71.0-linux-amd64.zip
cd ../../..These cached dependencies will be used during the Docker build process instead of downloading from the internet.
To utilize the prebuilt hosting model containers with self-hosted models, select type: ecr in the Model Deployment > Container Configs.
Deployment Steps
Once your configuration is complete:
Bootstrap CDK (if not already done):
bashnpm run bootstrapDeploy LISA:
bashnpm run deployDeploy specific stacks if needed:
bashSTACK=LisaServe npm run deployList available stacks:
bashnpm run cdk:list
Testing Your Deployment
After deployment completes (10-15 minutes), test with:
pytest lisa-sdk/tests --url <rest-url-from-cdk-output> --verify <path-to-server.crt>Troubleshooting ADC Deployments
- Build failures: Ensure all dependencies are accessible from ADC region
- Container pull errors: Verify ECR repositories exist and have correct permissions
- Lambda deployment issues: Check that lambda zip files are properly formatted and accessible
- Network connectivity: Confirm VPC configuration allows required outbound connections
- S3 Models Bucket Issues: Confirm your AWS_REGION is set correctly and that all of your model's files successfully uploaded to your models bucket