Experiments

experiments

Higher-level experiments (generally combining multiple Runs)

This module provides utilities to run more complex "experiments" that go beyond the scope of a single Run.

LatencyHeatmap `dataclass`

LatencyHeatmap(endpoint, source_file, clients=4, output_path=None, input_lengths=(lambda: [10, 50, 200, 500])(), output_lengths=(lambda: [128, 256, 512, 1024])(), requests_per_combination=1, create_payload_fn=None, create_payload_kwargs=dict(), tokenizer=None)

Experiment to measure how latency varies by input and output token count

This experiment uses a source text file to generate input prompts/payloads of different lengths, and measures how response time varies with both the input lengths and output/response lengths.

Attributes:

Name	Type	Description
`endpoint`	`Endpoint`	The LLM endpoint to test.
`source_file`	`UPath \| str`	The source file from which prompts of different lengths will be sampled (see `llmeter.prompt_utils.CreatePromptCollection` for details).
`clients`	`int`	The number of concurrent clients (requests) to use for the experiment. Note that using a high number of concurrent clients could impact observed latency.
`output_path`	`UPath \| str \| None`	The (local or Cloud e.g. `s3://...`) path to save the results.
`input_lengths`	`Sequence[int]`	The approximate input/prompt lengths to test. Since the locally-available `tokenizer` will often differ from the endpoint's own token counting, it's typically not possible to generate prompts with the exact specified token counts.
`output_lengths`	`Sequence[int]`	The target output lengths to test. Since generation may stop early for certain prompts, and some endpoints may not report exact token counts in their responses, the results may not correspond exactly to these targets.
`requests_per_combination`	`int`	The number of requests to make for each combination of input and output lengths.
`create_payload_fn`	`Callable \| None`	A function to create the actual endpoint payload for each invocation, from the sampled text prompt. Typically, you'll want to specify a prefix for your prompt in either this or the `create_payload_kwargs`. If not set, the endpoint's default `create_payload` method will be used.
`create_payload_kwargs`	`Dict`	Keyword arguments to pass to the `create_payload_fn`.
`tokenizer`	`Tokenizer \| None`	A tokenizer to be used for sampling prompts of the specified lengths, and also estimating the generated output lengths if necessary for your endpoint. If not set, the `llmeter.tokenizers.DummyTokenizer` will be used.

LoadTest `dataclass`

LoadTest(endpoint, payload, sequence_of_clients, min_requests_per_client=1, min_requests_per_run=10, run_duration=None, low_memory=False, progress_bar_stats=None, output_path=None, tokenizer=None, test_name=None, callbacks=None)

Experiment to explore how performance changes at different concurrency levels.

This experiment creates a series of Runs with different levels of concurrency, defined by sequence_of_clients, and runs them one after the other.

By default, each run sends a fixed number of requests (count-bound). Set run_duration to run each concurrency level for a fixed number of seconds instead (time-bound), which gives a more realistic picture of sustained throughput.

Attributes:

Name	Type	Description
`endpoint`	`Endpoint`	The LLM endpoint to test.
`payload`	`dict \| list[dict]`	The request payload(s) to send.
`sequence_of_clients`	`list[int]`	Concurrency levels to test.
`min_requests_per_client`	`int`	Minimum requests per client in count-bound mode.
`min_requests_per_run`	`int`	Minimum total requests per run in count-bound mode.
`run_duration`	`int \| float \| None`	When set, each concurrency level runs for this many seconds instead of a fixed request count. Mutually exclusive with `min_requests_per_client` / `min_requests_per_run`.
`low_memory`	`bool`	When `True`, responses are written to disk but not kept in memory. Requires `output_path`. Defaults to `False`.
`progress_bar_stats`	`dict \| None`	Controls which live stats appear on the progress bar. See `DEFAULT_DISPLAY_STATS` in `llmeter.live_display` for the default.
`output_path`	`PathLike \| str \| None`	Where to save results.
`tokenizer`	`Tokenizer \| None`	Optional tokenizer for token counting.
`test_name`	`str \| None`	Name for this test. Defaults to current date/time.
`callbacks`	`list[Callback] \| None`	Optional callbacks.

Example::

# Count-bound: 10 requests per client at each concurrency level
load_test = LoadTest(
    endpoint=my_endpoint,
    payload=sample_payload,
    sequence_of_clients=[1, 5, 10, 20],
    min_requests_per_client=10,
    output_path="outputs/load_test",
)
result = await load_test.run()
result.plot_results()

# Time-bound: 60 seconds per concurrency level
load_test = LoadTest(
    endpoint=my_endpoint,
    payload=sample_payload,
    sequence_of_clients=[1, 5, 10, 20],
    run_duration=60,
    output_path="outputs/load_test",
)
result = await load_test.run()

# Time-bound with low-memory mode for large-scale tests
load_test = LoadTest(
    endpoint=my_endpoint,
    payload=sample_payload,
    sequence_of_clients=[1, 5, 10, 20, 50],
    run_duration=120,
    low_memory=True,
    output_path="outputs/large_load_test",
)
result = await load_test.run()

run `async`

run(output_path=None)

Run the load test across all configured concurrency levels.

Creates a :class:~llmeter.runner.Runner and iterates through sequence_of_clients, running one test per concurrency level. In time-bound mode (run_duration is set), each level runs for a fixed duration. In count-bound mode, each level sends a fixed number of requests per client.

Parameters:

Name	Type	Description	Default
`load_path`		Optional (local or remote) folder to save results. If provided, individual	required
`Default`		`self.output_path` if set, else no files will be saved.	required

Returns:

Name	Type	Description
`LoadTestResult`		A result object containing one
		class:`~llmeter.results.Result` per concurrency level, keyed by
		client count.

Example::

load_test = LoadTest(
    endpoint=my_endpoint,
    payload=sample_payload,
    sequence_of_clients=[1, 5, 10],
    run_duration=30,
)
result = await load_test.run(output_path="outputs/my_test")

# Access individual results by client count
result.results[5].stats["requests_per_minute"]

# Plot all standard charts
result.plot_results()

Source code in llmeter/experiments.py

async def run(self, output_path: WritablePathLike | None = None):
    """Run the load test across all configured concurrency levels.

    Creates a :class:`~llmeter.runner.Runner` and iterates through
    ``sequence_of_clients``, running one test per concurrency level. In
    time-bound mode (``run_duration`` is set), each level runs for a fixed
    duration. In count-bound mode, each level sends a fixed number of
    requests per client.

    Args:
        load_path: Optional (local or remote) folder to save results. If provided, individual
        Run results will be written to `{output_path}/{test_name}/{NNNNN-clients}` subfolders.
        Default: `self.output_path` if set, else no files will be saved.

    Returns:
        LoadTestResult: A result object containing one
        :class:`~llmeter.results.Result` per concurrency level, keyed by
        client count.

    Example::

        load_test = LoadTest(
            endpoint=my_endpoint,
            payload=sample_payload,
            sequence_of_clients=[1, 5, 10],
            run_duration=30,
        )
        result = await load_test.run(output_path="outputs/my_test")

        # Access individual results by client count
        result.results[5].stats["requests_per_minute"]

        # Plot all standard charts
        result.plot_results()
    """
    output_path = ensure_path(output_path or self.output_path)
    if output_path:
        test_output_path = output_path / self._test_name
    else:
        test_output_path = None
    _runner = Runner(
        endpoint=self.endpoint,
        tokenizer=self.tokenizer,
        output_path=test_output_path,
    )

    self._results = []
    for c in tqdm(
        self.sequence_of_clients, desc="Configurations", disable=_disable_tqdm
    ):
        if self.run_duration is not None:
            result = await _runner.run(
                payload=self.payload,
                clients=c,
                run_duration=self.run_duration,
                run_name=f"{c:05.0f}-clients",
                callbacks=self.callbacks,
                low_memory=self.low_memory,
                progress_bar_stats=self.progress_bar_stats,
                output_path=test_output_path,
            )
        else:
            result = await _runner.run(
                payload=self.payload,
                clients=c,
                n_requests=self._get_n_requests(c),
                run_name=f"{c:05.0f}-clients",
                callbacks=self.callbacks,
                low_memory=self.low_memory,
                progress_bar_stats=self.progress_bar_stats,
                output_path=test_output_path,
            )
        self._results.append(result)

    return LoadTestResult(
        results={r.clients: r for r in self._results},
        test_name=self._test_name,
        output_path=test_output_path,
    )

LoadTestResult `dataclass`

LoadTestResult(results, test_name, output_path=None)

load `classmethod`

load(load_path, test_name=None, load_responses=True)

Load test results from a directory.

Parameters:

Name	Type	Description	Default
`load_path`	`ReadablePathLike \| None`	Directory path containing the load test results subdirectories	required
`test_name`	`str \| None`	Optional name for the test. If not provided, will use the directory name	`None`
`load_responses`	`bool`	Whether to load individual invocation responses. Defaults to True. When False, only summaries and pre-computed stats are loaded.	`True`

Returns:

Name	Type	Description
`LoadTestResult`	`LoadTestResult`	A LoadTestResult object containing the loaded results

Raises:

Type	Description
`FileNotFoundError`	If load_path does not exist or is None/empty
`ValueError`	If no results are found in the directory

Source code in llmeter/experiments.py

@classmethod
def load(
    cls,
    load_path: ReadablePathLike | None,
    test_name: str | None = None,
    load_responses: bool = True,
) -> "LoadTestResult":
    """Load test results from a directory.

    Args:
        load_path: Directory path containing the load test results subdirectories
        test_name: Optional name for the test. If not provided, will use the directory name
        load_responses: Whether to load individual invocation responses. Defaults to True.
            When False, only summaries and pre-computed stats are loaded.

    Returns:
        LoadTestResult: A LoadTestResult object containing the loaded results

    Raises:
        FileNotFoundError: If load_path does not exist or is None/empty
        ValueError: If no results are found in the directory
    """
    if not load_path:
        raise FileNotFoundError("Load path cannot be None or empty")

    if not isinstance(load_path, Path):
        load_path = ensure_path(load_path)

    if not load_path.exists():
        raise FileNotFoundError(f"Load path {load_path} does not exist")

    results = [
        Result.load(x, load_responses=load_responses)
        for x in load_path.iterdir()
        if x.is_dir()
    ]

    if not results:
        raise ValueError(f"No results found in {load_path}")

    return LoadTestResult(
        results={r.clients: r for r in results},
        test_name=test_name or load_path.name,
        output_path=load_path.parent,
    )

Experiments

experiments

LatencyHeatmap dataclass

LoadTest dataclass

run async

LoadTestResult dataclass

load classmethod

LatencyHeatmap `dataclass`

LoadTest `dataclass`

run `async`

LoadTestResult `dataclass`

load `classmethod`