Skip to content

User Guide

LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs) and applications that use them.

It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.

Key features

✅ Measure a wide range of LLMs and agents - including a range of Cloud providers and self-hosted models

✅ Quantify how prompt length, output length, and concurrent request count affect latency - with pre-built high-level experiments

✅ Simple, modular runner and result APIs for defining your own experiments and custom analyses

✅ Lightweight and straightforward to install on a range of environments

✅ Extend with callbacks for cost modeling, MLflow experiment tracking, and custom logic


  • 🚀 Getting started


    Install and try out LLMeter

    Installation

  • 🎯 Built-in endpoint types


    Connect to local or Cloud LLMs

    Endpoints

  • ✏️ Running experiments


    Start running tests and analyzing the results

    User Guide

  • 🔌 Callbacks


    Extend LLMeter with cost modeling, MLflow tracking, and custom hooks

    API Reference

  • Contribute


    Review the contributing guidelines to get started!

    GitHub