Skip to content

Results

results

Result dataclass

Result(responses, total_requests=None, clients=1, n_requests=None, total_test_time=None, model_id=None, output_path=None, endpoint_name=None, provider=None, run_name=None, run_description=None, start_time=None, first_request_time=None, last_request_time=None, end_time=None)

Results of a test run.

stats property

stats

Run metrics and aggregated statistics over the individual requests.

Returns a flat dictionary combining:

  • Basic run information (from to_dict()).
  • Aggregated statistics (average, p50, p90, p99) for time_to_last_token, time_to_first_token, num_tokens_output, and num_tokens_input. Keys use the format "{metric}-{aggregation}".
  • Run-level throughput metrics (requests_per_minute, total_input_tokens, etc.).
  • Any additional stats contributed by callbacks via :meth:_update_contributed_stats.

During a live run, stats are computed incrementally by :class:~llmeter.utils.RunningStats and stored in _preloaded_stats. When loading from disk with load_responses=False, pre-computed stats from stats.json are used. As a fallback (e.g. manually constructed Result), stats are computed on the fly from self.responses.

Returns:

Type Description
dict

A new shallow copy of the stats dictionary on each access.

Example::

result = await runner.run(payload=my_payload, clients=5)
result.stats["time_to_first_token-p50"]   # 0.312
result.stats["requests_per_minute"]        # 141.2
result.stats["failed_requests"]            # 0

__post_init__

__post_init__()

Initialize the Result instance.

Source code in llmeter/results.py
55
56
57
58
59
def __post_init__(self):
    """Initialize the Result instance."""
    self._contributed_stats = {}
    if not hasattr(self, "_preloaded_stats"):
        self._preloaded_stats = None

get_dimension

get_dimension(dimension, filter_dimension=None, filter_value=None)

Get the values of a specific dimension from the responses.

Parameters:

Name Type Description Default
dimension str

The name of the dimension to retrieve.

required
filter_dimension str

Name of dimension to filter on. Defaults to None.

None
filter_value any

Value to match for the filter dimension. Defaults to None.

None

Returns:

Name Type Description
list list

A list of values for the specified dimension across all responses.

Raises:

Type Description
ValueError

If the specified dimension is not found in any response.

Source code in llmeter/results.py
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
def get_dimension(
    self,
    dimension: str,
    filter_dimension: str | None = None,
    filter_value: Any = None,
) -> list:
    """
    Get the values of a specific dimension from the responses.

    Args:
        dimension (str): The name of the dimension to retrieve.
        filter_dimension (str, optional): Name of dimension to filter on. Defaults to None.
        filter_value (any, optional): Value to match for the filter dimension. Defaults to None.

    Returns:
        list: A list of values for the specified dimension across all responses.

    Raises:
        ValueError: If the specified dimension is not found in any response.
    """
    if filter_dimension is not None:
        values = [
            getattr(response, dimension)
            for response in self.responses
            if getattr(response, filter_dimension) == filter_value
        ]
    else:
        values = [getattr(response, dimension) for response in self.responses]

    if not any(values):
        # raise ValueError(f"Dimension {dimension} not found in any response")
        logger.warning(f"Dimension {dimension} not found in any response")
    return values

load classmethod

load(result_path, load_responses=True)

Load run results from disk or cloud storage.

Reads previously saved run results from the specified path. Handles both complete runs (with summary.json) and interrupted runs where only responses.jsonl, run_config.json, or stats.json are available.

Parameters:

Name Type Description Default
result_path UPath | str

The path to the directory containing the result files. Can be a string or a UPath object.

required
load_responses bool

Whether to load individual invocation responses from 'responses.jsonl'. Defaults to True. When False, only the summary and pre-computed stats are loaded, which is significantly faster for large result sets. Use result.load_responses() to load them on demand later.

True

Returns:

Name Type Description
Result Result

An instance of the Result class containing the loaded

Result

responses and summary data.

Raises:

Type Description
FileNotFoundError

If the directory does not exist or contains no recognizable result files (summary.json, responses.jsonl, or stats.json).

JSONDecodeError

If summary.json cannot be parsed.

Source code in llmeter/results.py
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
@classmethod
def load(
    cls, result_path: ReadablePathLike, load_responses: bool = True
) -> "Result":
    """
    Load run results from disk or cloud storage.

    Reads previously saved run results from the specified path. Handles
    both complete runs (with ``summary.json``) and interrupted runs where
    only ``responses.jsonl``, ``run_config.json``, or ``stats.json`` are
    available.

    Args:
        result_path (UPath | str): The path to the directory containing the
            result files. Can be a string or a UPath object.
        load_responses (bool): Whether to load individual invocation responses
            from 'responses.jsonl'. Defaults to True. When False, only the
            summary and pre-computed stats are loaded, which is significantly
            faster for large result sets. Use ``result.load_responses()`` to
            load them on demand later.

    Returns:
        Result: An instance of the Result class containing the loaded
        responses and summary data.

    Raises:
        FileNotFoundError: If the directory does not exist or contains no
            recognizable result files (``summary.json``, ``responses.jsonl``,
            or ``stats.json``).
        JSONDecodeError: If ``summary.json`` cannot be parsed.
    """
    result_path = ensure_path(result_path)
    summary_path = result_path / "summary.json"

    # 1. Resolve metadata
    if summary_path.exists():
        metadata = cls._load_summary(result_path)
    else:
        metadata = cls._recover_metadata(result_path)

    # 2. Load responses (if requested and available)
    responses = cls._read_responses(result_path) if load_responses else []
    if not load_responses:
        logger.info(
            "Loaded summary only (responses not loaded). "
            "Call result.load_responses() to load them on demand.",
        )

    # 3. Construct and resolve stats
    result = cls(responses=responses, **metadata)
    cls._resolve_stats(result, result_path / "stats.json")
    return result

load_responses

load_responses()

Load individual invocation responses from disk or cloud storage.

Reads the 'responses.jsonl' file from the result's output_path directory. This is useful when the Result was loaded with load_responses=False and you need to access the individual responses on demand.

Returns:

Type Description
list[InvocationResponse]

list[InvocationResponse]: The loaded responses. Also updates self.responses

list[InvocationResponse]

in place.

Raises:

Type Description
ValueError

If no output_path is set on this Result.

FileNotFoundError

If 'responses.jsonl' is not found at the output_path.

Source code in llmeter/results.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def load_responses(self) -> list[InvocationResponse]:
    """
    Load individual invocation responses from disk or cloud storage.

    Reads the 'responses.jsonl' file from the result's output_path directory.
    This is useful when the Result was loaded with ``load_responses=False`` and
    you need to access the individual responses on demand.

    Returns:
        list[InvocationResponse]: The loaded responses. Also updates ``self.responses``
        in place.

    Raises:
        ValueError: If no output_path is set on this Result.
        FileNotFoundError: If 'responses.jsonl' is not found at the output_path.
    """
    if not self.output_path:
        raise ValueError(
            "No output_path set on this Result. Cannot locate responses file."
        )
    responses_path = ensure_path(self.output_path) / "responses.jsonl"
    with responses_path.open("r") as f:
        self.responses = [InvocationResponse.from_json(line) for line in f if line]
    logger.info("Loaded %d responses from %s", len(self.responses), responses_path)
    # Recompute stats from the freshly loaded responses
    self._preloaded_stats = self._compute_stats(self)
    return self.responses

save

save(output_path=None)

Save the results to disk or cloud storage.

Saves the run results to the specified output path or the instance's default output path. It creates three files: 1. 'summary.json': Contains the overall summary of the results. 2. 'stats.json': Contains detailed statistics of the run. 3. 'responses.jsonl': Contains individual invocation responses - Only if the responses are not already saved at the indicated path.

Parameters:

Name Type Description Default
output_path UPath | str | None

The path where the result files will be saved. If None, the instance's default output_path will be used. Defaults to None.

None

Raises:

Type Description
ValueError

If no output path is provided and the instance doesn't have a default output_path set.

TypeError

If the provided output_path is not a valid type.

IOError

If there's an error writing to the output files.

Note

The method uses the Universal Path (UPath) library for file operations, which provides a unified interface for working with different file systems.

Source code in llmeter/results.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def save(self, output_path: WritablePathLike | None = None) -> None:
    """
    Save the results to disk or cloud storage.

    Saves the run results to the specified output path or the
    instance's default output path. It creates three files:
    1. 'summary.json': Contains the overall summary of the results.
    2. 'stats.json': Contains detailed statistics of the run.
    3. 'responses.jsonl': Contains individual invocation responses
        - Only if the responses are not already saved at the indicated path.


    Args:
        output_path (UPath | str | None, optional): The path where the result
            files will be saved. If None, the instance's default output_path
            will be used. Defaults to None.

    Raises:
        ValueError: If no output path is provided and the instance doesn't
            have a default output_path set.
        TypeError: If the provided output_path is not a valid type.
        IOError: If there's an error writing to the output files.

    Note:
        The method uses the Universal Path (UPath) library for file operations,
        which provides a unified interface for working with different file systems.
    """

    output_path = ensure_path(self.output_path or output_path)
    if output_path is None:
        raise ValueError("No output path provided")

    output_path.mkdir(parents=True, exist_ok=True)

    summary_path = output_path / "summary.json"
    stats_path = output_path / "stats.json"
    with summary_path.open("w") as f, stats_path.open("w") as s:
        f.write(self.to_json(indent=4))
        s.write(
            json.dumps(self.stats, indent=4, default=llmeter_default_serializer)
        )

    responses_path = output_path / "responses.jsonl"
    if not responses_path.exists():
        with responses_path.open("w") as f:
            for response in self.responses:
                f.write(
                    json.dumps(asdict(response), default=llmeter_default_serializer)
                    + "\n"
                )

to_dict

to_dict(include_responses=False)

Return a dictionary representation of this result.

Returns a plain dict produced by :func:dataclasses.asdict, preserving native Python types (datetime, UPath, etc.). This is suitable for programmatic access and internal data processing.

For JSON output, use :meth:to_json which delegates to :func:~llmeter.json_utils.llmeter_default_serializer for non-serializable types, or pass the dict through json.dumps(result.to_dict(), default=llmeter_default_serializer).

Parameters:

Name Type Description Default
include_responses bool

If True, include the full list of :class:~llmeter.endpoints.base.InvocationResponse dicts and the stats key. Defaults to False.

False

Returns:

Name Type Description
dict dict

A dictionary of result fields with native Python types.

Source code in llmeter/results.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def to_dict(self, include_responses: bool = False) -> dict:
    """Return a dictionary representation of this result.

    Returns a plain ``dict`` produced by :func:`dataclasses.asdict`,
    preserving native Python types (``datetime``, ``UPath``, etc.).
    This is suitable for programmatic access and internal data
    processing.

    For JSON output, use :meth:`to_json` which delegates to
    :func:`~llmeter.json_utils.llmeter_default_serializer` for
    non-serializable types, or pass the dict through
    ``json.dumps(result.to_dict(), default=llmeter_default_serializer)``.

    Args:
        include_responses: If ``True``, include the full list of
            :class:`~llmeter.endpoints.base.InvocationResponse` dicts
            and the ``stats`` key.  Defaults to ``False``.

    Returns:
        dict: A dictionary of result fields with native Python types.
    """
    data = asdict(self)
    if include_responses:
        return data
    return {k: v for k, v in data.items() if k not in ["responses", "stats"]}

to_json

to_json(default=llmeter_default_serializer, **kwargs)

Return the results as a JSON string.

Parameters:

Name Type Description Default
default

Fallback serializer. Defaults to :func:~llmeter.json_utils.llmeter_default_serializer.

llmeter_default_serializer
**kwargs

Extra keyword arguments passed to :func:json.dumps.

{}
Source code in llmeter/results.py
145
146
147
148
149
150
151
152
153
154
155
156
def to_json(self, default=llmeter_default_serializer, **kwargs) -> str:
    """Return the results as a JSON string.

    Args:
        default: Fallback serializer. Defaults to
            :func:`~llmeter.json_utils.llmeter_default_serializer`.
        **kwargs: Extra keyword arguments passed to :func:`json.dumps`.
    """
    summary = {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }
    return json.dumps(summary, default=default, **kwargs)