Deployment
Using pre-built resources
A default configuration will build the necessary containers, lambda layers, and production optimized web application at build time. In the event that you would like to use pre-built resources due to network connectivity reasons or other concerns with the environment where you'll be deploying LISA you can do so.
- For ECS containers (Models, APIs, etc) you can modify the
containerConfig
block of the corresponding entry inconfig.yaml
. For container images you can provide a path to a directory from which a docker container will be built (default), a path to a tarball, an ECR repository arn and optional tag, or a public registry path.- We provide immediate support for HuggingFace TGI and TEI containers and for vLLM containers. The
example_config.yaml
file provides examples for TGI and TEI, and the only difference for using vLLM is to change theinferenceContainer
,baseImage
, andpath
options, as indicated in the snippet below. All other options can remain the same as the model definition examples we have for the TGI or TEI models. vLLM can also support embedding models in this way, so all you need to do is refer to the embedding model artifacts and remove thestreaming
field to deploy the embedding model. - vLLM has support for the OpenAI Embeddings API, but model support for it is limited because the feature is new. Currently, the only supported embedding model with vLLM is intfloat/e5-mistral-7b-instruct, but this list is expected to grow over time as vLLM updates.yaml
ecsModels: - modelName: your-model-name inferenceContainer: tgi baseImage: ghcr.io/huggingface/text-generation-inference:2.0.1
- We provide immediate support for HuggingFace TGI and TEI containers and for vLLM containers. The
- If you are deploying the LISA Chat User Interface you can optionally specify the path to the pre-built website assets using the top level
webAppAssetsPath
parameter inconfig.yaml
. Specifying this path (typicallylib/user-interface/react/dist
) will avoid using a container to build and bundle the assets at CDK build time. - For the lambda layers you can specify the path to a local zip archive of the layer code by including the optional
lambdaLayerAssets
block inconfig.yaml
similar to the following:
lambdaLayerAssets:
authorizerLayerPath: lib/core/layers/authorizer_layer.zip
commonLayerPath: lib/core/layers/common_layer.zip
fastapiLayerPath: /path/to/fastapi_layer.zip
sdkLayerPath: lib/rag/layers/sdk_layer.zip
Deploying
Now that we have everything setup we are ready to deploy.
make deploy
By default, all stacks will be deployed but a particular stack can be deployed by providing the STACK
argument to the deploy
target.
make deploy STACK=LisaServe
Available stacks can be listed by running:
make listStacks
After the deploy
command is run, you should see many docker build outputs and eventually a CDK progress bar. The deployment should take about 10-15 minutes and will produce a single cloud formation output for the websocket URL.
You can test the deployment with the integration test:
pytest lisa-sdk/tests --url <rest-url-from-cdk-output> --verify <path-to-server.crt> | false