Model Compatibility

HuggingFace Generation Models

For generation models, or causal language models, LISA supports models that are supported by the underlying serving container, TGI. TGI divides compatibility into two categories: optimized models and best effort supported models. The list of optimized models is found here. The best effort uses the transformers codebase under-the-hood and so should work for most causal models on HuggingFace:

python

AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")

python

AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")

HuggingFace Embedding Models

Embedding models often utilize custom codebases and are not as uniform as generation models. For this reason you will likely need to create a new inferenceContainer. Follow the example provided for the instructor model.

vLLM Models

In addition to the support we have for the TGI and TEI containers, we support hosting models using the vLLM container. vLLM abides by the OpenAI specification, and as such allows both text generation and embedding on the models that vLLM supports. See the deployment section for details on how to set up the vLLM container for your models. Similar to how the HuggingFace containers will serve safetensor weights downloaded from the HuggingFace website, vLLM will do the same, and our configuration will allow you to serve these artifacts automatically. vLLM does not have many supported models for embeddings, but as they become available, LISA will support them as long as the vLLM container version is updated in the config.yaml file and as long as the model's safetensors can be found in S3.

Architecture Overview

Configuration Schema

Model Compatibility

HuggingFace Generation Models

HuggingFace Embedding Models

vLLM Models

Model Compatibility ​

HuggingFace Generation Models ​

HuggingFace Embedding Models ​

vLLM Models ​

Model Compatibility

HuggingFace Generation Models

HuggingFace Embedding Models

vLLM Models