Skip to content

Results

results

Result dataclass

Result(responses, total_requests, clients, n_requests, total_test_time=None, model_id=None, output_path=None, endpoint_name=None, provider=None, run_name=None, run_description=None, start_time=None, end_time=None)

Results of a test run.

stats property

stats

Run metrics and aggregated statistics over the individual requests

This combined view includes: - Basic information about the run (from the Result's dictionary representation) - Aggregated statistics ('average', 'p50', 'p90', 'p99') for: - Time to last token - Time to first token - Number of tokens output - Number of tokens input

Aggregated statistics are keyed in the format "{stat_name}-{aggregation_type}"

This property is read-only and returns a new shallow copy of the data on each access. Default stats provided by LLMeter are calculated on first access and then cached. Callbacks Callbacks or other mechanisms needing to augment stats should use the _update_contributed_stats() method.

When the Result was loaded with load_responses=False, pre-computed stats from stats.json are returned if available. Call load_responses() to load the individual responses and recompute stats from the raw data.

__post_init__

__post_init__()

Initialize the Result instance.

Source code in llmeter/results.py
63
64
65
66
67
def __post_init__(self):
    """Initialize the Result instance."""
    self._contributed_stats = {}
    if not hasattr(self, "_preloaded_stats"):
        self._preloaded_stats = None

get_dimension

get_dimension(dimension, filter_dimension=None, filter_value=None)

Get the values of a specific dimension from the responses.

Parameters:

Name Type Description Default
dimension str

The name of the dimension to retrieve.

required
filter_dimension str

Name of dimension to filter on. Defaults to None.

None
filter_value any

Value to match for the filter dimension. Defaults to None.

None

Returns:

Name Type Description
list list

A list of values for the specified dimension across all responses.

Raises:

Type Description
ValueError

If the specified dimension is not found in any response.

Source code in llmeter/results.py
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
def get_dimension(
    self,
    dimension: str,
    filter_dimension: str | None = None,
    filter_value: Any = None,
) -> list:
    """
    Get the values of a specific dimension from the responses.

    Args:
        dimension (str): The name of the dimension to retrieve.
        filter_dimension (str, optional): Name of dimension to filter on. Defaults to None.
        filter_value (any, optional): Value to match for the filter dimension. Defaults to None.

    Returns:
        list: A list of values for the specified dimension across all responses.

    Raises:
        ValueError: If the specified dimension is not found in any response.
    """
    if filter_dimension is not None:
        values = [
            getattr(response, dimension)
            for response in self.responses
            if getattr(response, filter_dimension) == filter_value
        ]
    else:
        values = [getattr(response, dimension) for response in self.responses]

    if not any(values):
        # raise ValueError(f"Dimension {dimension} not found in any response")
        logger.warning(f"Dimension {dimension} not found in any response")
    return values

load classmethod

load(result_path, load_responses=True)

Load run results from disk or cloud storage.

Reads previously saved run results from the specified path. It expects 'summary.json' to be present in the given directory. By default, also loads 'responses.jsonl' containing individual invocation responses.

Parameters:

Name Type Description Default
result_path UPath | str

The path to the directory containing the result files. Can be a string or a UPath object.

required
load_responses bool

Whether to load individual invocation responses from 'responses.jsonl'. Defaults to True. When False, only the summary and pre-computed stats are loaded, which is significantly faster for large result sets. Use result.load_responses() to load them on demand later.

True

Returns:

Name Type Description
Result Result

An instance of the Result class containing the loaded

Result

responses and summary data.

Raises:

Type Description
FileNotFoundError

If required files are not found in the specified directory.

JSONDecodeError

If there's an issue parsing the JSON data in either file.

Source code in llmeter/results.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
@classmethod
def load(
    cls, result_path: os.PathLike | str, load_responses: bool = True
) -> "Result":
    """
    Load run results from disk or cloud storage.

    Reads previously saved run results from the specified
    path. It expects 'summary.json' to be present in the given directory.
    By default, also loads 'responses.jsonl' containing individual invocation
    responses.

    Args:
        result_path (UPath | str): The path to the directory containing the
            result files. Can be a string or a UPath object.
        load_responses (bool): Whether to load individual invocation responses
            from 'responses.jsonl'. Defaults to True. When False, only the
            summary and pre-computed stats are loaded, which is significantly
            faster for large result sets. Use ``result.load_responses()`` to
            load them on demand later.

    Returns:
        Result: An instance of the Result class containing the loaded
        responses and summary data.

    Raises:
        FileNotFoundError: If required files are not found in the specified
            directory.
        JSONDecodeError: If there's an issue parsing the JSON data in
            either file.

    """
    result_path = Path(result_path)
    summary_path = result_path / "summary.json"

    with summary_path.open("r") as g:
        summary = json.load(g)

    # Convert datetime strings back to datetime objects
    for key in ["start_time", "end_time"]:
        if key in summary and summary[key] and isinstance(summary[key], str):
            try:
                summary[key] = datetime.fromisoformat(summary[key])
            except ValueError:
                pass

    # Ensure output_path is set so load_responses() can find the files later
    if "output_path" not in summary or summary["output_path"] is None:
        summary["output_path"] = str(result_path)

    if load_responses:
        responses_path = result_path / "responses.jsonl"
        with responses_path.open("r") as f:
            responses = [
                InvocationResponse(**json.loads(line)) for line in f if line
            ]
    else:
        responses = []
        responses_path = result_path / "responses.jsonl"
        logger.info(
            "Loaded summary only (responses not loaded). "
            "Individual responses are stored at: %s. "
            "Call result.load_responses() to load them on demand.",
            responses_path,
        )

    result = cls(responses=responses, **summary)

    # When skipping responses, load pre-computed stats from stats.json if available
    # so that result.stats works without needing the responses
    if not load_responses:
        stats_path = result_path / "stats.json"
        if stats_path.exists():
            with stats_path.open("r") as s:
                result._preloaded_stats = json.loads(s.read())
                # Convert datetime strings in stats
                for key in ["start_time", "end_time"]:
                    val = result._preloaded_stats.get(key)
                    if val and isinstance(val, str):
                        try:
                            result._preloaded_stats[key] = datetime.fromisoformat(
                                val
                            )
                        except ValueError:
                            pass
        else:
            result._preloaded_stats = None

    return result

load_responses

load_responses()

Load individual invocation responses from disk or cloud storage.

Reads the 'responses.jsonl' file from the result's output_path directory. This is useful when the Result was loaded with load_responses=False and you need to access the individual responses on demand.

Returns:

Type Description
list[InvocationResponse]

list[InvocationResponse]: The loaded responses. Also updates self.responses

list[InvocationResponse]

in place.

Raises:

Type Description
ValueError

If no output_path is set on this Result.

FileNotFoundError

If 'responses.jsonl' is not found at the output_path.

Source code in llmeter/results.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def load_responses(self) -> list[InvocationResponse]:
    """
    Load individual invocation responses from disk or cloud storage.

    Reads the 'responses.jsonl' file from the result's output_path directory.
    This is useful when the Result was loaded with ``load_responses=False`` and
    you need to access the individual responses on demand.

    Returns:
        list[InvocationResponse]: The loaded responses. Also updates ``self.responses``
        in place.

    Raises:
        ValueError: If no output_path is set on this Result.
        FileNotFoundError: If 'responses.jsonl' is not found at the output_path.
    """
    if not self.output_path:
        raise ValueError(
            "No output_path set on this Result. Cannot locate responses file."
        )
    responses_path = Path(self.output_path) / "responses.jsonl"
    with responses_path.open("r") as f:
        self.responses = [
            InvocationResponse(**json.loads(line)) for line in f if line
        ]
    logger.info("Loaded %d responses from %s", len(self.responses), responses_path)
    # Invalidate cached stats so they are recomputed with the loaded responses
    self.__dict__.pop("_builtin_stats", None)
    return self.responses

save

save(output_path=None)

Save the results to disk or cloud storage.

Saves the run results to the specified output path or the instance's default output path. It creates three files: 1. 'summary.json': Contains the overall summary of the results. 2. 'stats.json': Contains detailed statistics of the run. 3. 'responses.jsonl': Contains individual invocation responses - Only if the responses are not already saved at the indicated path.

Parameters:

Name Type Description Default
output_path UPath | str | None

The path where the result files will be saved. If None, the instance's default output_path will be used. Defaults to None.

None

Raises:

Type Description
ValueError

If no output path is provided and the instance doesn't have a default output_path set.

TypeError

If the provided output_path is not a valid type.

IOError

If there's an error writing to the output files.

Note

The method uses the Universal Path (UPath) library for file operations, which provides a unified interface for working with different file systems.

Source code in llmeter/results.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def save(self, output_path: os.PathLike | str | None = None):
    """
    Save the results to disk or cloud storage.

    Saves the run results to the specified output path or the
    instance's default output path. It creates three files:
    1. 'summary.json': Contains the overall summary of the results.
    2. 'stats.json': Contains detailed statistics of the run.
    3. 'responses.jsonl': Contains individual invocation responses
        - Only if the responses are not already saved at the indicated path.


    Args:
        output_path (UPath | str | None, optional): The path where the result
            files will be saved. If None, the instance's default output_path
            will be used. Defaults to None.

    Raises:
        ValueError: If no output path is provided and the instance doesn't
            have a default output_path set.
        TypeError: If the provided output_path is not a valid type.
        IOError: If there's an error writing to the output files.

    Note:
        The method uses the Universal Path (UPath) library for file operations,
        which provides a unified interface for working with different file systems.
    """

    try:
        output_path = Path(self.output_path or output_path)
    except TypeError:
        raise ValueError("No output path provided")

    output_path.mkdir(parents=True, exist_ok=True)

    summary_path = output_path / "summary.json"
    stats_path = output_path / "stats.json"
    with summary_path.open("w") as f, stats_path.open("w") as s:
        f.write(self.to_json(indent=4))
        s.write(json.dumps(self.stats, indent=4, default=utc_datetime_serializer))

    responses_path = output_path / "responses.jsonl"
    if not responses_path.exists():
        with responses_path.open("w") as f:
            for response in self.responses:
                f.write(json.dumps(asdict(response)) + "\n")

to_dict

to_dict(include_responses=False)

Return the results as a dictionary.

Source code in llmeter/results.py
138
139
140
141
142
143
144
def to_dict(self, include_responses: bool = False):
    """Return the results as a dictionary."""
    if include_responses:
        return asdict(self)
    return {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }

to_json

to_json(**kwargs)

Return the results as a JSON string.

Source code in llmeter/results.py
131
132
133
134
135
136
def to_json(self, **kwargs):
    """Return the results as a JSON string."""
    summary = {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }
    return json.dumps(summary, default=utc_datetime_serializer, **kwargs)

utc_datetime_serializer

utc_datetime_serializer(obj)

Serialize datetime objects to UTC ISO format strings.

Parameters:

Name Type Description Default
obj Any

Object to serialize. If datetime, converts to ISO format string with 'Z' timezone. Otherwise returns string representation.

required

Returns:

Name Type Description
str str

ISO format string with 'Z' timezone for datetime objects, or string representation for other objects.

Source code in llmeter/results.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def utc_datetime_serializer(obj: Any) -> str:
    """
    Serialize datetime objects to UTC ISO format strings.

    Args:
        obj: Object to serialize. If datetime, converts to ISO format string with 'Z' timezone.
             Otherwise returns string representation.

    Returns:
        str: ISO format string with 'Z' timezone for datetime objects, or string representation
             for other objects.
    """
    if isinstance(obj, datetime):
        # Convert to UTC if timezone is set
        if obj.tzinfo is not None:
            obj = obj.astimezone(timezone.utc)
        return obj.isoformat(timespec="seconds").replace("+00:00", "Z")
    return str(obj)