Connect to Endpoints
LLMeter provides a range of optimized Endpoint connectors for different types of LLM deployment, and supports 100+ more through LiteLLM.
Streaming vs non-streaming endpoints
Since generating text responses from LLMs (especially long responses) can take significant time, many LLM deployment platforms and services offer response streaming: in which the model starts returning chunks of the response as soon as they're generated - rather than waiting for the whole thing before responding.
Response streaming can reduce perceived solution latency by allowing consumers to start reading (or processing) the response before generation is completed... But not always applicable, if some other component in the overall architecture doesn't support it.
Supported endpoint types
The endpoints section of the API reference lists the range of built-in endpoint types currently offered by LLMeter.
You can also create your own integrations by extending the Endpoint class interface, if your target isn't already supported by the built-in endpoints or through the LiteLLM Endpoint and LiteLLM Python SDK.
Custom endpoint example
To create a custom endpoint, implement three methods:
invoke(payload)— call the API and pass the raw response toparse_response()parse_response(raw_response, start_t)— extract text, token counts, and metadata into anInvocationResponseprepare_payload(payload, **kwargs)(optional) — transform the caller's payload before invocation (merge kwargs, inject model ID, etc.)
The @llmeter_invoke decorator wraps invoke with error handling, timing, and metadata back-fill, so your implementation only needs the happy path:
from llmeter.endpoints.base import Endpoint, InvocationResponse, llmeter_invoke
class MyEndpoint(Endpoint):
def __init__(self, model_id: str, api_key: str):
super().__init__(
endpoint_name="my-service",
model_id=model_id,
provider="my-provider",
)
self._api_key = api_key
@llmeter_invoke
def invoke(self, payload: dict) -> InvocationResponse:
start_t = time.perf_counter()
raw_response = my_api_call(payload, api_key=self._api_key)
return self.parse_response(raw_response, start_t)
def parse_response(self, raw_response, start_t: float) -> InvocationResponse:
return InvocationResponse(
response_text=raw_response["text"],
num_tokens_input=raw_response.get("input_tokens"),
num_tokens_output=raw_response.get("output_tokens"),
)
def prepare_payload(self, payload, **kwargs):
return {**payload, **kwargs, "model": self.model_id}
@staticmethod
def create_payload(user_message: str, max_tokens: int = 256) -> dict:
return {"prompt": user_message, "max_tokens": max_tokens}
You don't need to handle errors, set time_to_last_token, input_payload, input_prompt, or id — the decorator does all of that. If your invoke or parse_response raises an exception, it's caught and converted to an error InvocationResponse with the prepared payload attached.
Note that Amazon Bedrock supports several different APIs for accessing Foundation Models. Depending on your target API, you can use LLMeter's:
bedrockendpoints for connecting to Bedrock's Converse or ConverseStream APIsbedrock_invokeendpoints for connecting to Bedrock's InvokeModel or InvokeModelWithResponseStream APIsopenaiendpoints for connecting to OpenAI Chat Completions-compatible APIs, including Chat Completions API endpoints on Amazon Bedrock Mantleopenai_responseendpoints for connecting to OpenAI Responses-compatible APIs, including Responses API endpoints on Amazon Bedrock Mantle. Responses is a newer OpenAI API, with additional features including structured outputs, instructions-based system prompts, and improved multi-turn conversation handling
Multi-modal content
Yes, LLMeter supports sending requests with multi-modal content - for profiling use-cases like analysis of documents, images, audio and video, as well as text.
Generating request payloads
Each Endpoint class offers a create_payload() function to help you build test request payloads, in case you're not sure of the underlying JSON format for your target LLM/API.
For help building multi-modal requests, we recommend to install the optional multimodal extra:
pip install 'llmeter[multimodal]'
Or with uv:
uv pip install 'llmeter[multimodal]'
This installs the puremagic library for detecting media file formats from magic bytes in their content. Without it, format detection falls back to file extensions and the .from_bytes(...) functions mentioned below won't work.
The particular API of create_payload() may vary between providers, but should be generally similar. Here are some examples with BedrockConverse.create_payload:
from llmeter.endpoints import BedrockConverse
from llmeter.prompt_utils import DocumentContent, ImageContent, VideoContent
# Single image from file
payload = BedrockConverse.create_payload(
user_message=[
"What is in this image?",
ImageContent.from_path("photo.jpg"),
],
max_tokens=256
)
# Multiple images
payload = BedrockConverse.create_payload(
user_message=[
"Compare these images:",
ImageContent.from_path("image1.jpg"),
ImageContent.from_path("image2.jpg"),
],
max_tokens=512
)
# Image from bytes (requires puremagic for format detection)
with open("photo.jpg", "rb") as f:
image_bytes = f.read()
payload = BedrockConverse.create_payload(
user_message=[
"What is in this image?",
ImageContent.from_bytes(image_bytes),
],
max_tokens=256
)
# Mixed content types
payload = BedrockConverse.create_payload(
user_message=[
"Analyze this presentation and supporting materials",
DocumentContent.from_path("slides.pdf"),
ImageContent.from_path("chart.png"),
],
max_tokens=1024
)
# Video analysis
payload = BedrockConverse.create_payload(
user_message=[
"Describe what happens in this video",
VideoContent.from_path("clip.mp4"),
],
max_tokens=1024
)
Actual media format support varies by target model and API provider. LLMeter's payload building helpers attempt to automatically detect content of the following types, and represent them according to the requirements of your target Endpoint: - Images: JPEG, PNG, GIF, WebP - Documents: PDF - Videos: MP4, MOV, AVI - Audio: MP3, WAV, OGG
⚠️ Security warning: Format detection is NOT input validation
IMPORTANT: The content-based format detection in this library is for testing and development convenience only and should NOT be used for with untrusted files without proper validation: Including for example files uploaded by end users; files from untrusted URLs, email attachments, or third-party systems.
LLMeter's multimodal prompt content utilities DO:
- Detect likely file format from magic bytes (puremagic) or extension (mimetypes)
- Read binary content from files
- Package content for API endpoints
- Provide type checking (bytes vs strings)
They DO NOT:
- ❌ Validate file content safety or integrity
- ❌ Scan for malicious content or malware
- ❌ Sanitize or clean file data
- ❌ Protect against malformed or exploited files
- ❌ Guarantee format correctness beyond detection heuristics
- ❌ Validate file size or prevent memory exhaustion
- ❌ Check for embedded scripts or exploits
- ❌ Verify file authenticity or source
Recommended security practices for untrusted files
When working with untrusted files (user uploads, external sources, etc.), you MUST implement proper security measures:
- Validate file sources: Only accept files from trusted, authenticated sources
- Scan for malware: Use antivirus/malware scanning (e.g., ClamAV) before processing
- Validate file integrity: Verify checksums, digital signatures, or other integrity mechanisms
- Sanitize content: Use specialized libraries to validate and sanitize file content:
- Images: Re-encode with PIL/Pillow to strip metadata and validate structure
- PDFs: Use PDF sanitization libraries to remove scripts and validate structure
- Videos: Re-encode with ffmpeg to validate and sanitize
- Limit file sizes: Enforce maximum file size limits before reading into memory
- Sandbox processing: Process untrusted files in isolated environments (containers, VMs)
- Validate API responses: Check that API endpoints successfully processed the content
- Implement rate limiting: Prevent abuse through excessive file uploads
- Log and monitor: Track file processing for security auditing