User Guide
LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs) and applications that use them.
It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.
Key features
✅ Measure a wide range of LLMs and agents - including a range of Cloud providers and self-hosted models
✅ Quantify how prompt length, output length, and concurrent request count affect latency - with pre-built high-level experiments
✅ Simple, modular runner and result APIs for defining your own experiments and custom analyses
✅ Lightweight and straightforward to install on a range of environments
✅ Extend with callbacks for cost modeling, MLflow experiment tracking, and custom logic
-
🚀 Getting started
Install and try out LLMeter
-
🎯 Built-in endpoint types
Connect to local or Cloud LLMs
-
✏️ Running experiments
Start running tests and analyzing the results
-
🔌 Callbacks
Extend LLMeter with cost modeling, MLflow tracking, and custom hooks
-
Contribute
Review the contributing guidelines to get started!