Runner
runner
Runner
dataclass
Runner(endpoint=None, output_path=None, tokenizer=None, clients=1, n_requests=None, run_duration=None, payload=None, run_name=None, run_description=None, timeout=60, callbacks=None, low_memory=False, progress_bar_stats=None, disable_per_client_progress_bar=True, disable_clients_progress_bar=True)
Bases: _RunConfig
Run (one or more) LLM test sets using a base configuration.
First create a Runner with base configuration for your test(s), then call .run() with
optional run-specific overrides. This pattern allows you to group related runs together for
organizing experiments (like ramping load tests) that might use more than one Run in total.
All attributes of this class may be unset (as you may choose to set them only at the Run level), but some are "Mandatory" to be set either at the Runner or individual-run level, as described below.
Attributes:
| Name | Type | Description |
|---|---|---|
endpoint |
Endpoint | dict | None
|
The LLM endpoint to be tested. Must be set at either the Runner or specific Run level. |
output_path |
PathLike | str | None
|
The (cloud or local) base folder under which run outputs and configurations should be stored. By default, outputs will not be saved to file. |
tokenizer |
Tokenizer | Any | None
|
Optional tokenizer used to estimate input and output
token counts for endpoints that don't report exact information. By default, LLMeter's
|
clients |
int
|
The number of concurrent clients to use for sending requests. Defaults to 1. |
n_requests |
int | None
|
The number of LLM invocations to generate per client. By
default, each request in |
run_duration |
int | float | timedelta | None
|
Run each client for this many seconds instead of a
fixed request count. Clients send requests continuously until the duration expires.
Mutually exclusive with |
payload |
dict | list[dict] | PathLike | str | None
|
The request data to send to the
endpoint under test. You can provide a single JSON payload (dict), a list of payloads
(list[dict]), or a path to one or more JSON/JSON-Lines files to be loaded by
|
run_name |
str | None
|
Name to use for a specific test Run. This is ignored if set at the
Runner level, and should instead be set in |
run_description |
str | None
|
A natural-language description for the test Run. Can be set
either at the Runner level (in which case the same description will be shared across
all Runs), or individually in |
timeout |
int | float
|
The maximum time (in seconds) to wait for each response from the endpoint. Defaults to 60 seconds. |
callbacks |
list[Callback] | None
|
Optional callbacks to enable during the test Run. See
|
low_memory |
bool
|
When |
progress_bar_stats |
dict | None
|
Controls which live stats appear on the progress bar.
Maps short display labels to canonical stat keys — see
:data: |
disable_per_client_progress_bar |
bool
|
Set |
disable_clients_progress_bar |
bool
|
Set |
add_callback
add_callback(callback)
Add a callback to the runner's list of callbacks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
callback
|
Callback
|
The callback to be added. |
required |
Source code in llmeter/runner.py
1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 | |
run
async
run(*, endpoint=None, output_path=None, tokenizer=None, clients=None, n_requests=None, run_duration=None, payload=None, run_name=None, run_description=None, timeout=None, callbacks=None, low_memory=None, progress_bar_stats=None, disable_per_client_progress_bar=None, disable_clients_progress_bar=None)
Run a test against an LLM endpoint
This method tests the performance of the endpoint by sending multiple concurrent requests with the given payload(s). It measures the total time taken to complete the test, generates invocations for the given payload(s), and optionally saves the results and metrics.
For arguments that are not specified, the Runner's attributes will be used by default.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
Endpoint | dict | None
|
The LLM endpoint to be tested. Must be set at either the Runner or specific Run level. |
None
|
output_path
|
WritablePathLike | None
|
The (cloud or local) base folder under which
run outputs and configurations should be stored. By default, a new |
None
|
tokenizer
|
Tokenizer | Any | None
|
Optional tokenizer used to estimate input and output token counts for endpoints that don't report exact information. |
None
|
clients
|
int
|
The number of concurrent clients to use for sending requests. |
None
|
n_requests
|
int | None
|
The number of LLM invocations to generate per client.
Mutually exclusive with |
None
|
run_duration
|
int | float | timedelta | None
|
Run each client for this many seconds
instead of a fixed request count. Clients send requests continuously
until the duration expires. Mutually exclusive with Example::
|
None
|
payload
|
dict | list[dict] | ReadablePathLike | None
|
The request data to send to the
endpoint under test. You can provide a single JSON payload (dict), a list of
payloads (list[dict]), or a path to one or more JSON/JSON-Lines files to be loaded
by |
None
|
run_name
|
str | None
|
Name to use for a specific test Run. By default, runs are named
with the date and time they're requested in format: |
None
|
run_description
|
str | None
|
A natural-language description for the test Run. |
None
|
timeout
|
int | float
|
The maximum time (in seconds) to wait for each response from the endpoint. |
None
|
callbacks
|
list[Callback] | None
|
Optional callbacks to enable during the test Run. See
|
None
|
low_memory
|
bool
|
When Example::
|
None
|
progress_bar_stats
|
dict
|
Controls which live stats appear on the
progress bar. Maps short display labels to canonical stat keys — see
:data: Example::
|
None
|
disable_per_client_progress_bar
|
bool
|
Set |
None
|
disable_clients_progress_bar
|
bool
|
Set |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Result |
Result
|
An object containing the test results, including the generated |
Result
|
response texts, total test time, total requests, number of clients, |
|
Result
|
number of requests per client, and other relevant metrics. |
Raises:
| Type | Description |
|---|---|
Exception
|
If there's an error during the test execution or if the endpoint cannot be reached. |
Note
- This method uses asyncio for concurrent processing.
- Progress is displayed using tqdm if not disabled.
- Responses are collected and processed asynchronously.
- If an output_path is provided, results are saved to files.
Source code in llmeter/runner.py
1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 | |
process_before_invoke_callbacks
async
process_before_invoke_callbacks(callbacks, payload)
Process the before_run callbacks for a Run.
This method is expected to be called exactly once after the _Run object is created. Attempting to re-use a _Run object may result in undefined behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
callbacks
|
list[Callback]
|
The list of callbacks to process. |
required |
Source code in llmeter/runner.py
905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 | |