Skip to content

Installation

LLMeter requires Python version 3.10 or higher.

To install the basic metering functionalities, you can install the minimal package using pip:

pip install llmeter

Or with uv (recommended for faster installation):

uv pip install llmeter

LLMeter also offers optional extra features that require additional dependencies. Currently these extras include:

  • plotting: Add methods to generate charts and heatmaps to summarize the results
  • openai: Enable testing any endpoint with an OpenAI-compatible API, including Amazon Bedrock Mantle, and LLM gateways like LiteLLM, Kong, and KrakenD
  • litellm: Enable testing a range of different models through LiteLLM
  • mlflow: Enable logging LLMeter experiments to MLflow

You can install one or more of these extra options using pip:

pip install 'llmeter[plotting,openai,litellm,mlflow]'

Or with uv:

uv pip install 'llmeter[plotting,openai,litellm,mlflow]'

To install all optional dependencies at once:

pip install 'llmeter[all]'

Where to install and use LLMeter

LLMeter measures end-to-end latencies and uses pure-Python (asyncio-based) concurrency to parallelize requests.

✅ Remember that network latency from the environment where you run LLMeter to the LLM under test will be included in results.

You may want to run LLMeter on the Cloud if that's where your application and LLM will be deployed.

✅ Check your network bandwidth and compute capacity are sufficient to avoid bottlenecking your highest-concurrency tests.

If hosting LLMeter on burstable Cloud instance types (e.g. AWS t3, t4g, etc), be aware that sustained network and compute limits are lower than short-term burst resources.

Because generative AI inference is a compute-intensive task, load testing LLMs has not typically required very high-powered (compute, network) resources from the client side. This led us to a deliberate decision to keep LLMeter simple to install and use by avoiding more complex distributed computing frameworks, in favour of plain Python+asyncio. If you have a use-case with such a fast LLM or large request volume that this approach is limiting to you - please share your feedback!