Skip to content

Runner

runner

Runner dataclass

Runner(endpoint=None, output_path=None, tokenizer=None, clients=1, n_requests=None, run_duration=None, payload=None, run_name=None, run_description=None, timeout=60, callbacks=None, low_memory=False, progress_bar_stats=None, disable_per_client_progress_bar=True, disable_clients_progress_bar=True)

Bases: _RunConfig

Run (one or more) LLM test sets using a base configuration.

First create a Runner with base configuration for your test(s), then call .run() with optional run-specific overrides. This pattern allows you to group related runs together for organizing experiments (like ramping load tests) that might use more than one Run in total.

All attributes of this class may be unset (as you may choose to set them only at the Run level), but some are "Mandatory" to be set either at the Runner or individual-run level, as described below.

Attributes:

Name Type Description
endpoint Endpoint | dict | None

The LLM endpoint to be tested. Must be set at either the Runner or specific Run level.

output_path PathLike | str | None

The (cloud or local) base folder under which run outputs and configurations should be stored. By default, outputs will not be saved to file.

tokenizer Tokenizer | Any | None

Optional tokenizer used to estimate input and output token counts for endpoints that don't report exact information. By default, LLMeter's DummyTokenizer will be used if needed.

clients int

The number of concurrent clients to use for sending requests. Defaults to 1.

n_requests int | None

The number of LLM invocations to generate per client. By default, each request in payload will be sent once by each client. Mutually exclusive with run_duration.

run_duration int | float | timedelta | None

Run each client for this many seconds instead of a fixed request count. Clients send requests continuously until the duration expires. Mutually exclusive with n_requests. Defaults to None (count-bound mode). Accepts a number of seconds or a timedelta.

payload dict | list[dict] | PathLike | str | None

The request data to send to the endpoint under test. You can provide a single JSON payload (dict), a list of payloads (list[dict]), or a path to one or more JSON/JSON-Lines files to be loaded by llmeter.prompt_utils.load_payloads(). Must be set at either the Runner or specific Run level.

run_name str | None

Name to use for a specific test Run. This is ignored if set at the Runner level, and should instead be set in Runner.run() to name a specific run. By default, runs are named with the date and time they're requested in format: %Y%m%d-%H%M

run_description str | None

A natural-language description for the test Run. Can be set either at the Runner level (in which case the same description will be shared across all Runs), or individually in Runner.run().

timeout int | float

The maximum time (in seconds) to wait for each response from the endpoint. Defaults to 60 seconds.

callbacks list[Callback] | None

Optional callbacks to enable during the test Run. See llmeter.callbacks for more information.

low_memory bool

When True, responses are written to disk but not kept in memory during the run. Stats are computed incrementally via :class:~llmeter.utils.RunningStats. Requires output_path to be set. Use result.load_responses() to load responses from disk after the run. Defaults to False.

progress_bar_stats dict | None

Controls which live stats appear on the progress bar. Maps short display labels to canonical stat keys — see :data:~llmeter.live_display.DEFAULT_DISPLAY_STATS for the format and defaults. Pass {} to disable live stats entirely. Defaults to None (use built-in defaults).

disable_per_client_progress_bar bool

Set True to disable per-client progress bars from showing during the run. Default False (each client's progress will be shown).

disable_clients_progress_bar bool

Set True to disable overall progress bar from showing during the run. Default False (overall requests progress will be shown).

add_callback

add_callback(callback)

Add a callback to the runner's list of callbacks.

Parameters:

Name Type Description Default
callback Callback

The callback to be added.

required
Source code in llmeter/runner.py
961
962
963
964
965
966
967
968
969
970
971
def add_callback(self, callback: Callback):
    """
    Add a callback to the runner's list of callbacks.

    Args:
        callback (Callback): The callback to be added.
    """
    if self.callbacks is None:
        self.callbacks = [callback]
    else:
        self.callbacks.append(callback)

run async

run(*, endpoint=None, output_path=None, tokenizer=None, clients=None, n_requests=None, run_duration=None, payload=None, run_name=None, run_description=None, timeout=None, callbacks=None, low_memory=None, progress_bar_stats=None, disable_per_client_progress_bar=None, disable_clients_progress_bar=None)

Run a test against an LLM endpoint

This method tests the performance of the endpoint by sending multiple concurrent requests with the given payload(s). It measures the total time taken to complete the test, generates invocations for the given payload(s), and optionally saves the results and metrics.

For arguments that are not specified, the Runner's attributes will be used by default.

Parameters:

Name Type Description Default
endpoint Endpoint | dict | None

The LLM endpoint to be tested. Must be set at either the Runner or specific Run level.

None
output_path WritablePathLike | None

The (cloud or local) base folder under which run outputs and configurations should be stored. By default, a new run_name sub-folder will be created under the Runner's output_path if set - otherwise outputs will not be saved to file.

None
tokenizer Tokenizer | Any | None

Optional tokenizer used to estimate input and output token counts for endpoints that don't report exact information.

None
clients int

The number of concurrent clients to use for sending requests.

None
n_requests int | None

The number of LLM invocations to generate per client. Mutually exclusive with run_duration.

None
run_duration int | float | timedelta | None

Run each client for this many seconds instead of a fixed request count. Clients send requests continuously until the duration expires. Mutually exclusive with n_requests. Accepts a number of seconds or a timedelta.

Example::

# Run for 60 seconds with 5 concurrent clients:
result = await runner.run(run_duration=60, clients=5)
result.total_requests  # actual count completed
None
payload dict | list[dict] | ReadablePathLike | None

The request data to send to the endpoint under test. You can provide a single JSON payload (dict), a list of payloads (list[dict]), or a path to one or more JSON/JSON-Lines files to be loaded by llmeter.prompt_utils.load_payloads(). Must be set at either the Runner or specific Run level.

None
run_name str | None

Name to use for a specific test Run. By default, runs are named with the date and time they're requested in format: %Y%m%d-%H%M

None
run_description str | None

A natural-language description for the test Run.

None
timeout int | float

The maximum time (in seconds) to wait for each response from the endpoint.

None
callbacks list[Callback] | None

Optional callbacks to enable during the test Run. See llmeter.callbacks for more information.

None
low_memory bool

When True, responses are written to disk but not kept in memory. Stats are computed incrementally via :class:~llmeter.utils.RunningStats. Requires output_path. Use result.load_responses() to access responses after the run.

Example::

result = await runner.run(
    output_path="/tmp/my_run",
    low_memory=True,
)
result.stats          # works (computed incrementally)
result.responses      # [] (empty)
result.load_responses()  # loads from disk
None
progress_bar_stats dict

Controls which live stats appear on the progress bar. Maps short display labels to canonical stat keys — see :data:~llmeter.live_display.DEFAULT_DISPLAY_STATS for the format and defaults. Pass {} to disable live stats entirely.

Example::

# Show only p99 latency and tokens per second:
result = await runner.run(
    progress_bar_stats={
        "p99_ttlt": "time_to_last_token-p99",
        "tps": ("time_per_output_token-p50", "inv"),
        "fail": "failed_requests",
    },
)
None
disable_per_client_progress_bar bool

Set True to disable per-client progress bars from showing during the run.

None
disable_clients_progress_bar bool

Set True to disable overall progress bar from showing during the run.

None

Returns:

Name Type Description
Result Result

An object containing the test results, including the generated

Result

response texts, total test time, total requests, number of clients,

Result

number of requests per client, and other relevant metrics.

Raises:

Type Description
Exception

If there's an error during the test execution or if the endpoint cannot be reached.

Note
  • This method uses asyncio for concurrent processing.
  • Progress is displayed using tqdm if not disabled.
  • Responses are collected and processed asynchronously.
  • If an output_path is provided, results are saved to files.
Source code in llmeter/runner.py
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
async def run(
    self,
    *,  # Prevent mistakes with this long arg list by allowing only keyword-arg based passing
    # Explicitly name and re-document the args for ease of use of this important public method
    endpoint: Endpoint | dict | None = None,
    output_path: WritablePathLike | None = None,
    tokenizer: Tokenizer | Any | None = None,
    clients: int | None = None,
    n_requests: int | None = None,
    run_duration: int | float | timedelta | None = None,
    payload: dict | list[dict] | ReadablePathLike | None = None,
    run_name: str | None = None,
    run_description: str | None = None,
    timeout: int | float | None = None,
    callbacks: list[Callback] | None = None,
    low_memory: bool | None = None,
    progress_bar_stats: dict[str, str | tuple[str, str]] | None = None,
    disable_per_client_progress_bar: bool | None = None,
    disable_clients_progress_bar: bool | None = None,
) -> Result:
    """
    Run a test against an LLM endpoint

    This method tests the performance of the endpoint by sending multiple concurrent requests
    with the given payload(s). It measures the total time taken to complete the test, generates
    invocations for the given payload(s), and optionally saves the results and metrics.

    For arguments that are not specified, the Runner's attributes will be used by default.

    Args:
        endpoint (Endpoint | dict | None): The LLM endpoint to be tested. **Must be set** at
            either the Runner or specific Run level.
        output_path (WritablePathLike | None): The (cloud or local) base folder under which
            run outputs and configurations should be stored. By default, a new `run_name`
            sub-folder will be created under the Runner's `output_path` if set - otherwise
            outputs will not be saved to file.
        tokenizer (Tokenizer | Any | None): Optional tokenizer used to estimate input and
            output token counts for endpoints that don't report exact information.
        clients (int): The number of concurrent clients to use for sending requests.
        n_requests (int | None): The number of LLM invocations to generate *per client*.
            Mutually exclusive with ``run_duration``.
        run_duration (int | float | timedelta | None): Run each client for this many seconds
            instead of a fixed request count.  Clients send requests continuously
            until the duration expires.  Mutually exclusive with ``n_requests``.
            Accepts a number of seconds or a ``timedelta``.

            Example::

                # Run for 60 seconds with 5 concurrent clients:
                result = await runner.run(run_duration=60, clients=5)
                result.total_requests  # actual count completed

        payload (dict | list[dict] | ReadablePathLike | None): The request data to send to the
            endpoint under test. You can provide a single JSON payload (dict), a list of
            payloads (list[dict]), or a path to one or more JSON/JSON-Lines files to be loaded
            by `llmeter.prompt_utils.load_payloads()`. **Must be set** at either the Runner or
            specific Run level.
        run_name (str | None): Name to use for a specific test Run. By default, runs are named
            with the date and time they're requested in format: `%Y%m%d-%H%M`
        run_description (str | None): A natural-language description for the test Run.
        timeout (int | float): The maximum time (in seconds) to wait for each response from the
            endpoint.
        callbacks (list[Callback] | None): Optional callbacks to enable during the test Run. See
            `llmeter.callbacks` for more information.
        low_memory (bool): When ``True``, responses are written to disk but not
            kept in memory.  Stats are computed incrementally via
            :class:`~llmeter.utils.RunningStats`.  Requires ``output_path``.
            Use ``result.load_responses()`` to access responses after the run.

            Example::

                result = await runner.run(
                    output_path="/tmp/my_run",
                    low_memory=True,
                )
                result.stats          # works (computed incrementally)
                result.responses      # [] (empty)
                result.load_responses()  # loads from disk

        progress_bar_stats (dict): Controls which live stats appear on the
            progress bar.  Maps short display labels to canonical stat keys — see
            :data:`~llmeter.live_display.DEFAULT_DISPLAY_STATS` for the format and
            defaults.  Pass ``{}`` to disable live stats entirely.

            Example::

                # Show only p99 latency and tokens per second:
                result = await runner.run(
                    progress_bar_stats={
                        "p99_ttlt": "time_to_last_token-p99",
                        "tps": ("time_per_output_token-p50", "inv"),
                        "fail": "failed_requests",
                    },
                )
        disable_per_client_progress_bar (bool): Set `True` to disable per-client progress bars
            from showing during the run.
        disable_clients_progress_bar (bool): Set `True` to disable overall progress bar from
            showing during the run.

    Returns:
        Result: An object containing the test results, including the generated
        response texts, total test time, total requests, number of clients,
        number of requests per client, and other relevant metrics.

    Raises:
        Exception: If there's an error during the test execution or if the
            endpoint cannot be reached.

    Note:
        - This method uses asyncio for concurrent processing.
        - Progress is displayed using tqdm if not disabled.
        - Responses are collected and processed asynchronously.
        - If an output_path is provided, results are saved to files.
    """

    run = self._prepare_run(
        endpoint=endpoint,
        output_path=output_path,
        tokenizer=tokenizer,
        clients=clients,
        n_requests=n_requests,
        run_duration=run_duration,
        payload=payload,
        run_name=run_name,
        run_description=run_description,
        timeout=timeout,
        callbacks=callbacks,
        low_memory=low_memory,
        progress_bar_stats=progress_bar_stats,
        disable_per_client_progress_bar=disable_per_client_progress_bar,
        disable_clients_progress_bar=disable_clients_progress_bar,
    )
    assert isinstance(run.payload, list)
    assert isinstance(run.run_name, str)
    return await run._run()

process_before_invoke_callbacks async

process_before_invoke_callbacks(callbacks, payload)

Process the before_run callbacks for a Run.

This method is expected to be called exactly once after the _Run object is created. Attempting to re-use a _Run object may result in undefined behavior.

Parameters:

Name Type Description Default
callbacks list[Callback]

The list of callbacks to process.

required
Source code in llmeter/runner.py
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
async def process_before_invoke_callbacks(
    callbacks: list[Callback] | None, payload: dict
) -> dict:
    """
    Process the `before_run` callbacks for a Run.

    This method is expected to be called *exactly once* after the _Run object is created.
    Attempting to re-use a _Run object may result in undefined behavior.

    Args:
        callbacks (list[Callback]): The list of callbacks to process.
    """
    if callbacks is not None:
        p = deepcopy(payload)

        [await cb.before_invoke(p) for cb in callbacks]
        return p
    return payload