Connect to Endpoints

LLMeter provides a range of optimized Endpoint connectors for different types of LLM deployment, and supports 100+ more through LiteLLM.

Streaming vs non-streaming endpoints

Since generating text responses from LLMs (especially long responses) can take significant time, many LLM deployment platforms and services offer response streaming: in which the model starts returning chunks of the response as soon as they're generated - rather than waiting for the whole thing before responding.

Response streaming can reduce perceived solution latency by allowing consumers to start reading (or processing) the response before generation is completed... But not always applicable, if some other component in the overall architecture doesn't support it.

Supported endpoint types

The endpoints section of the API reference lists the range of built-in endpoint types currently offered by LLMeter.

You can also create your own integrations by extending the Endpoint class interface, if your target isn't already supported by the built-in endpoints or through the LiteLLM Endpoint and LiteLLM Python SDK.

Note that Amazon Bedrock supports several different APIs for accessing Foundation Models. Depending on your target API, you can use LLMeter's:

bedrock endpoints for connecting to Bedrock's Converse or ConverseStream APIs
bedrock_invoke endpoints for connecting to Bedrock's InvokeModel or InvokeModelWithResponseStream APIs
openai endpoints for connecting to Bedrock's OpenAI-compatible Mantle APIs