Skip to content

Results

results

Result dataclass

Result(responses, total_requests=None, clients=1, n_requests=None, total_test_time=None, model_id=None, output_path=None, endpoint_name=None, provider=None, run_name=None, run_description=None, start_time=None, first_request_time=None, last_request_time=None, end_time=None)

Results of a test run.

stats property

stats

Run metrics and aggregated statistics over the individual requests.

Returns a flat dictionary combining:

  • Basic run information (from to_dict()).
  • Aggregated statistics (average, p50, p90, p99) for time_to_last_token, time_to_first_token, num_tokens_output, and num_tokens_input. Keys use the format "{metric}-{aggregation}".
  • Run-level throughput metrics (requests_per_minute, total_input_tokens, etc.).
  • Any additional stats contributed by callbacks via :meth:_update_contributed_stats.

During a live run, stats are computed incrementally by :class:~llmeter.utils.RunningStats and stored in _preloaded_stats. When loading from disk with load_responses=False, pre-computed stats from stats.json are used. As a fallback (e.g. manually constructed Result), stats are computed on the fly from self.responses.

Returns:

Type Description
dict

A new shallow copy of the stats dictionary on each access.

Example::

result = await runner.run(payload=my_payload, clients=5)
result.stats["time_to_first_token-p50"]   # 0.312
result.stats["requests_per_minute"]        # 141.2
result.stats["failed_requests"]            # 0

__post_init__

__post_init__()

Initialize the Result instance.

Source code in llmeter/results.py
44
45
46
47
48
def __post_init__(self):
    """Initialize the Result instance."""
    self._contributed_stats = {}
    if not hasattr(self, "_preloaded_stats"):
        self._preloaded_stats = None

get_dimension

get_dimension(dimension, filter_dimension=None, filter_value=None)

Get the values of a specific dimension from the responses.

Parameters:

Name Type Description Default
dimension str

The name of the dimension to retrieve.

required
filter_dimension str

Name of dimension to filter on. Defaults to None.

None
filter_value any

Value to match for the filter dimension. Defaults to None.

None

Returns:

Name Type Description
list list

A list of values for the specified dimension across all responses.

Raises:

Type Description
ValueError

If the specified dimension is not found in any response.

Source code in llmeter/results.py
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
def get_dimension(
    self,
    dimension: str,
    filter_dimension: str | None = None,
    filter_value: Any = None,
) -> list:
    """
    Get the values of a specific dimension from the responses.

    Args:
        dimension (str): The name of the dimension to retrieve.
        filter_dimension (str, optional): Name of dimension to filter on. Defaults to None.
        filter_value (any, optional): Value to match for the filter dimension. Defaults to None.

    Returns:
        list: A list of values for the specified dimension across all responses.

    Raises:
        ValueError: If the specified dimension is not found in any response.
    """
    if filter_dimension is not None:
        values = [
            getattr(response, dimension)
            for response in self.responses
            if getattr(response, filter_dimension) == filter_value
        ]
    else:
        values = [getattr(response, dimension) for response in self.responses]

    if not any(values):
        # raise ValueError(f"Dimension {dimension} not found in any response")
        logger.warning(f"Dimension {dimension} not found in any response")
    return values

load classmethod

load(result_path, load_responses=True)

Load run results from disk or cloud storage.

Reads previously saved run results from the specified path. It expects 'summary.json' to be present in the given directory. By default, also loads 'responses.jsonl' containing individual invocation responses.

Parameters:

Name Type Description Default
result_path UPath | str

The path to the directory containing the result files. Can be a string or a UPath object.

required
load_responses bool

Whether to load individual invocation responses from 'responses.jsonl'. Defaults to True. When False, only the summary and pre-computed stats are loaded, which is significantly faster for large result sets. Use result.load_responses() to load them on demand later.

True

Returns:

Name Type Description
Result Result

An instance of the Result class containing the loaded

Result

responses and summary data.

Raises:

Type Description
FileNotFoundError

If required files are not found in the specified directory.

JSONDecodeError

If there's an issue parsing the JSON data in either file.

Source code in llmeter/results.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
@classmethod
def load(
    cls, result_path: ReadablePathLike, load_responses: bool = True
) -> "Result":
    """
    Load run results from disk or cloud storage.

    Reads previously saved run results from the specified
    path. It expects 'summary.json' to be present in the given directory.
    By default, also loads 'responses.jsonl' containing individual invocation
    responses.

    Args:
        result_path (UPath | str): The path to the directory containing the
            result files. Can be a string or a UPath object.
        load_responses (bool): Whether to load individual invocation responses
            from 'responses.jsonl'. Defaults to True. When False, only the
            summary and pre-computed stats are loaded, which is significantly
            faster for large result sets. Use ``result.load_responses()`` to
            load them on demand later.

    Returns:
        Result: An instance of the Result class containing the loaded
        responses and summary data.

    Raises:
        FileNotFoundError: If required files are not found in the specified
            directory.
        JSONDecodeError: If there's an issue parsing the JSON data in
            either file.

    """
    result_path = ensure_path(result_path)
    summary_path = result_path / "summary.json"

    with summary_path.open("r") as g:
        summary = json.load(g)

    # Convert datetime strings back to datetime objects
    for key in [
        "start_time",
        "end_time",
        "first_request_time",
        "last_request_time",
    ]:
        if key in summary and summary[key] and isinstance(summary[key], str):
            try:
                summary[key] = datetime.fromisoformat(
                    summary[key].replace("Z", "+00:00")
                )
            except ValueError:
                pass

    # Ensure output_path is set so load_responses() can find the files later
    if "output_path" not in summary or summary["output_path"] is None:
        summary["output_path"] = str(result_path)

    if load_responses:
        responses_path = result_path / "responses.jsonl"
        with responses_path.open("r") as f:
            responses = [InvocationResponse.from_json(line) for line in f if line]
    else:
        responses = []
        responses_path = result_path / "responses.jsonl"
        logger.info(
            "Loaded summary only (responses not loaded). "
            "Individual responses are stored at: %s. "
            "Call result.load_responses() to load them on demand.",
            responses_path,
        )

    result = cls(responses=responses, **summary)

    # Load or compute stats
    if not load_responses:
        # Use pre-computed stats from disk when responses aren't loaded
        stats_path = result_path / "stats.json"
        if stats_path.exists():
            with stats_path.open("r") as s:
                result._preloaded_stats = json.loads(s.read())
                # Convert datetime strings in stats
                for key in [
                    "start_time",
                    "end_time",
                    "first_request_time",
                    "last_request_time",
                ]:
                    val = result._preloaded_stats.get(key)
                    if val and isinstance(val, str):
                        try:
                            result._preloaded_stats[key] = datetime.fromisoformat(
                                val.replace("Z", "+00:00")
                            )
                        except ValueError:
                            pass
        else:
            result._preloaded_stats = None
    else:
        # Compute stats from the loaded responses, but also merge any
        # contributed stats that were persisted in stats.json so they
        # survive a save/load round-trip.
        result._preloaded_stats = cls._compute_stats(result)
        stats_path = result_path / "stats.json"
        if stats_path.exists():
            with stats_path.open("r") as s:
                saved_stats = json.loads(s.read())
            # Contributed stats are any keys in the saved file that are
            # not produced by _compute_stats (i.e. they came from callbacks).
            for key, value in saved_stats.items():
                if key not in result._preloaded_stats:
                    result._preloaded_stats[key] = value

    return result

load_responses

load_responses()

Load individual invocation responses from disk or cloud storage.

Reads the 'responses.jsonl' file from the result's output_path directory. This is useful when the Result was loaded with load_responses=False and you need to access the individual responses on demand.

Returns:

Type Description
list[InvocationResponse]

list[InvocationResponse]: The loaded responses. Also updates self.responses

list[InvocationResponse]

in place.

Raises:

Type Description
ValueError

If no output_path is set on this Result.

FileNotFoundError

If 'responses.jsonl' is not found at the output_path.

Source code in llmeter/results.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
def load_responses(self) -> list[InvocationResponse]:
    """
    Load individual invocation responses from disk or cloud storage.

    Reads the 'responses.jsonl' file from the result's output_path directory.
    This is useful when the Result was loaded with ``load_responses=False`` and
    you need to access the individual responses on demand.

    Returns:
        list[InvocationResponse]: The loaded responses. Also updates ``self.responses``
        in place.

    Raises:
        ValueError: If no output_path is set on this Result.
        FileNotFoundError: If 'responses.jsonl' is not found at the output_path.
    """
    if not self.output_path:
        raise ValueError(
            "No output_path set on this Result. Cannot locate responses file."
        )
    responses_path = ensure_path(self.output_path) / "responses.jsonl"
    with responses_path.open("r") as f:
        self.responses = [InvocationResponse.from_json(line) for line in f if line]
    logger.info("Loaded %d responses from %s", len(self.responses), responses_path)
    # Recompute stats from the freshly loaded responses
    self._preloaded_stats = self._compute_stats(self)
    return self.responses

save

save(output_path=None)

Save the results to disk or cloud storage.

Saves the run results to the specified output path or the instance's default output path. It creates three files: 1. 'summary.json': Contains the overall summary of the results. 2. 'stats.json': Contains detailed statistics of the run. 3. 'responses.jsonl': Contains individual invocation responses - Only if the responses are not already saved at the indicated path.

Parameters:

Name Type Description Default
output_path UPath | str | None

The path where the result files will be saved. If None, the instance's default output_path will be used. Defaults to None.

None

Raises:

Type Description
ValueError

If no output path is provided and the instance doesn't have a default output_path set.

TypeError

If the provided output_path is not a valid type.

IOError

If there's an error writing to the output files.

Note

The method uses the Universal Path (UPath) library for file operations, which provides a unified interface for working with different file systems.

Source code in llmeter/results.py
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def save(self, output_path: WritablePathLike | None = None):
    """
    Save the results to disk or cloud storage.

    Saves the run results to the specified output path or the
    instance's default output path. It creates three files:
    1. 'summary.json': Contains the overall summary of the results.
    2. 'stats.json': Contains detailed statistics of the run.
    3. 'responses.jsonl': Contains individual invocation responses
        - Only if the responses are not already saved at the indicated path.


    Args:
        output_path (UPath | str | None, optional): The path where the result
            files will be saved. If None, the instance's default output_path
            will be used. Defaults to None.

    Raises:
        ValueError: If no output path is provided and the instance doesn't
            have a default output_path set.
        TypeError: If the provided output_path is not a valid type.
        IOError: If there's an error writing to the output files.

    Note:
        The method uses the Universal Path (UPath) library for file operations,
        which provides a unified interface for working with different file systems.
    """

    output_path = ensure_path(self.output_path or output_path)
    if output_path is None:
        raise ValueError("No output path provided")

    output_path.mkdir(parents=True, exist_ok=True)

    summary_path = output_path / "summary.json"
    stats_path = output_path / "stats.json"
    with summary_path.open("w") as f, stats_path.open("w") as s:
        f.write(self.to_json(indent=4))
        s.write(
            json.dumps(self.stats, indent=4, default=llmeter_default_serializer)
        )

    responses_path = output_path / "responses.jsonl"
    if not responses_path.exists():
        with responses_path.open("w") as f:
            for response in self.responses:
                f.write(
                    json.dumps(asdict(response), default=llmeter_default_serializer)
                    + "\n"
                )

to_dict

to_dict(include_responses=False)

Return a dictionary representation of this result.

Returns a plain dict produced by :func:dataclasses.asdict, preserving native Python types (datetime, UPath, etc.). This is suitable for programmatic access and internal data processing.

For JSON output, use :meth:to_json which delegates to :func:~llmeter.json_utils.llmeter_default_serializer for non-serializable types, or pass the dict through json.dumps(result.to_dict(), default=llmeter_default_serializer).

Parameters:

Name Type Description Default
include_responses bool

If True, include the full list of :class:~llmeter.endpoints.base.InvocationResponse dicts and the stats key. Defaults to False.

False

Returns:

Name Type Description
dict

A dictionary of result fields with native Python types.

Source code in llmeter/results.py
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def to_dict(self, include_responses: bool = False):
    """Return a dictionary representation of this result.

    Returns a plain ``dict`` produced by :func:`dataclasses.asdict`,
    preserving native Python types (``datetime``, ``UPath``, etc.).
    This is suitable for programmatic access and internal data
    processing.

    For JSON output, use :meth:`to_json` which delegates to
    :func:`~llmeter.json_utils.llmeter_default_serializer` for
    non-serializable types, or pass the dict through
    ``json.dumps(result.to_dict(), default=llmeter_default_serializer)``.

    Args:
        include_responses: If ``True``, include the full list of
            :class:`~llmeter.endpoints.base.InvocationResponse` dicts
            and the ``stats`` key.  Defaults to ``False``.

    Returns:
        dict: A dictionary of result fields with native Python types.
    """
    data = asdict(self)
    if include_responses:
        return data
    return {k: v for k, v in data.items() if k not in ["responses", "stats"]}

to_json

to_json(default=llmeter_default_serializer, **kwargs)

Return the results as a JSON string.

Parameters:

Name Type Description Default
default

Fallback serializer. Defaults to :func:~llmeter.json_utils.llmeter_default_serializer.

llmeter_default_serializer
**kwargs

Extra keyword arguments passed to :func:json.dumps.

{}
Source code in llmeter/results.py
116
117
118
119
120
121
122
123
124
125
126
127
def to_json(self, default=llmeter_default_serializer, **kwargs):
    """Return the results as a JSON string.

    Args:
        default: Fallback serializer. Defaults to
            :func:`~llmeter.json_utils.llmeter_default_serializer`.
        **kwargs: Extra keyword arguments passed to :func:`json.dumps`.
    """
    summary = {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }
    return json.dumps(summary, default=default, **kwargs)