User Guide

LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs) and applications that use them.

It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.

Key features

✅ Measure a wide range of LLMs and agents - including a range of Cloud providers and self-hosted models

✅ Quantify how prompt length, output length, and concurrent request count affect latency - with pre-built high-level experiments

✅ Simple, modular runner and result APIs for defining your own experiments and custom analyses

✅ Lightweight and straightforward to install on a range of environments

✅ Extend with callbacks for cost modeling, MLflow experiment tracking, and custom logic

🚀 Getting started

Install and try out LLMeter

Installation
🎯 Built-in endpoint types

Connect to local or Cloud LLMs

Endpoints
✏️ Running experiments

Start running tests and analyzing the results

User Guide
🔌 Callbacks

Extend LLMeter with cost modeling, MLflow tracking, and custom hooks

API Reference
Contribute

Review the contributing guidelines to get started!

GitHub