Results

results

Result `dataclass`

Result(responses, total_requests=None, clients=1, n_requests=None, total_test_time=None, model_id=None, output_path=None, endpoint_name=None, provider=None, run_name=None, run_description=None, start_time=None, first_request_time=None, last_request_time=None, end_time=None)

Results of a test run.

stats `property`

stats

Run metrics and aggregated statistics over the individual requests.

Returns a flat dictionary combining:

Basic run information (from to_dict()).
Aggregated statistics (average, p50, p90, p99) for time_to_last_token, time_to_first_token, num_tokens_output, and num_tokens_input. Keys use the format "{metric}-{aggregation}".
Run-level throughput metrics (requests_per_minute, total_input_tokens, etc.).
Any additional stats contributed by callbacks via :meth:_update_contributed_stats.

During a live run, stats are computed incrementally by :class:~llmeter.utils.RunningStats and stored in _preloaded_stats. When loading from disk with load_responses=False, pre-computed stats from stats.json are used. As a fallback (e.g. manually constructed Result), stats are computed on the fly from self.responses.

Returns:

Type	Description
`dict`	A new shallow copy of the stats dictionary on each access.

Example::

result = await runner.run(payload=my_payload, clients=5)
result.stats["time_to_first_token-p50"]   # 0.312
result.stats["requests_per_minute"]        # 141.2
result.stats["failed_requests"]            # 0

__post_init__

__post_init__()

Initialize the Result instance.

Source code in llmeter/results.py

def __post_init__(self):
    """Initialize the Result instance."""
    self._contributed_stats = {}
    if not hasattr(self, "_preloaded_stats"):
        self._preloaded_stats = None

get_dimension

get_dimension(dimension, filter_dimension=None, filter_value=None)

Get the values of a specific dimension from the responses.

Parameters:

Name	Type	Description	Default
`dimension`	`str`	The name of the dimension to retrieve.	required
`filter_dimension`	`str`	Name of dimension to filter on. Defaults to None.	`None`
`filter_value`	`any`	Value to match for the filter dimension. Defaults to None.	`None`

Returns:

Name	Type	Description
`list`	`list`	A list of values for the specified dimension across all responses.

Raises:

Type	Description
`ValueError`	If the specified dimension is not found in any response.

Source code in llmeter/results.py

def get_dimension(
    self,
    dimension: str,
    filter_dimension: str | None = None,
    filter_value: Any = None,
) -> list:
    """
    Get the values of a specific dimension from the responses.

    Args:
        dimension (str): The name of the dimension to retrieve.
        filter_dimension (str, optional): Name of dimension to filter on. Defaults to None.
        filter_value (any, optional): Value to match for the filter dimension. Defaults to None.

    Returns:
        list: A list of values for the specified dimension across all responses.

    Raises:
        ValueError: If the specified dimension is not found in any response.
    """
    if filter_dimension is not None:
        values = [
            getattr(response, dimension)
            for response in self.responses
            if getattr(response, filter_dimension) == filter_value
        ]
    else:
        values = [getattr(response, dimension) for response in self.responses]

    if not any(values):
        # raise ValueError(f"Dimension {dimension} not found in any response")
        logger.warning(f"Dimension {dimension} not found in any response")
    return values

load `classmethod`

load(result_path, load_responses=True)

Load run results from disk or cloud storage.

Reads previously saved run results from the specified path. It expects 'summary.json' to be present in the given directory. By default, also loads 'responses.jsonl' containing individual invocation responses.

Parameters:

Name	Type	Description	Default
`result_path`	`UPath \| str`	The path to the directory containing the result files. Can be a string or a UPath object.	required
`load_responses`	`bool`	Whether to load individual invocation responses from 'responses.jsonl'. Defaults to True. When False, only the summary and pre-computed stats are loaded, which is significantly faster for large result sets. Use `result.load_responses()` to load them on demand later.	`True`

Returns:

Name	Type	Description
`Result`	`Result`	An instance of the Result class containing the loaded
	`Result`	responses and summary data.

Raises:

Type	Description
`FileNotFoundError`	If required files are not found in the specified directory.
`JSONDecodeError`	If there's an issue parsing the JSON data in either file.

Source code in llmeter/results.py

@classmethod
def load(
    cls, result_path: ReadablePathLike, load_responses: bool = True
) -> "Result":
    """
    Load run results from disk or cloud storage.

    Reads previously saved run results from the specified
    path. It expects 'summary.json' to be present in the given directory.
    By default, also loads 'responses.jsonl' containing individual invocation
    responses.

    Args:
        result_path (UPath | str): The path to the directory containing the
            result files. Can be a string or a UPath object.
        load_responses (bool): Whether to load individual invocation responses
            from 'responses.jsonl'. Defaults to True. When False, only the
            summary and pre-computed stats are loaded, which is significantly
            faster for large result sets. Use ``result.load_responses()`` to
            load them on demand later.

    Returns:
        Result: An instance of the Result class containing the loaded
        responses and summary data.

    Raises:
        FileNotFoundError: If required files are not found in the specified
            directory.
        JSONDecodeError: If there's an issue parsing the JSON data in
            either file.

    """
    result_path = ensure_path(result_path)
    summary_path = result_path / "summary.json"

    with summary_path.open("r") as g:
        summary = json.load(g)

    # Convert datetime strings back to datetime objects
    for key in [
        "start_time",
        "end_time",
        "first_request_time",
        "last_request_time",
    ]:
        if key in summary and summary[key] and isinstance(summary[key], str):
            try:
                summary[key] = datetime.fromisoformat(
                    summary[key].replace("Z", "+00:00")
                )
            except ValueError:
                pass

    # Ensure output_path is set so load_responses() can find the files later
    if "output_path" not in summary or summary["output_path"] is None:
        summary["output_path"] = str(result_path)

    if load_responses:
        responses_path = result_path / "responses.jsonl"
        with responses_path.open("r") as f:
            responses = [InvocationResponse.from_json(line) for line in f if line]
    else:
        responses = []
        responses_path = result_path / "responses.jsonl"
        logger.info(
            "Loaded summary only (responses not loaded). "
            "Individual responses are stored at: %s. "
            "Call result.load_responses() to load them on demand.",
            responses_path,
        )

    result = cls(responses=responses, **summary)

    # Load or compute stats
    if not load_responses:
        # Use pre-computed stats from disk when responses aren't loaded
        stats_path = result_path / "stats.json"
        if stats_path.exists():
            with stats_path.open("r") as s:
                result._preloaded_stats = json.loads(s.read())
                # Convert datetime strings in stats
                for key in [
                    "start_time",
                    "end_time",
                    "first_request_time",
                    "last_request_time",
                ]:
                    val = result._preloaded_stats.get(key)
                    if val and isinstance(val, str):
                        try:
                            result._preloaded_stats[key] = datetime.fromisoformat(
                                val.replace("Z", "+00:00")
                            )
                        except ValueError:
                            pass
        else:
            result._preloaded_stats = None
    else:
        # Compute stats from the loaded responses, but also merge any
        # contributed stats that were persisted in stats.json so they
        # survive a save/load round-trip.
        result._preloaded_stats = cls._compute_stats(result)
        stats_path = result_path / "stats.json"
        if stats_path.exists():
            with stats_path.open("r") as s:
                saved_stats = json.loads(s.read())
            # Contributed stats are any keys in the saved file that are
            # not produced by _compute_stats (i.e. they came from callbacks).
            for key, value in saved_stats.items():
                if key not in result._preloaded_stats:
                    result._preloaded_stats[key] = value

    return result

load_responses

load_responses()

Load individual invocation responses from disk or cloud storage.

Reads the 'responses.jsonl' file from the result's output_path directory. This is useful when the Result was loaded with load_responses=False and you need to access the individual responses on demand.

Returns:

Type	Description
`list[InvocationResponse]`	list[InvocationResponse]: The loaded responses. Also updates `self.responses`
`list[InvocationResponse]`	in place.

Raises:

Type	Description
`ValueError`	If no output_path is set on this Result.
`FileNotFoundError`	If 'responses.jsonl' is not found at the output_path.

Source code in llmeter/results.py

def load_responses(self) -> list[InvocationResponse]:
    """
    Load individual invocation responses from disk or cloud storage.

    Reads the 'responses.jsonl' file from the result's output_path directory.
    This is useful when the Result was loaded with ``load_responses=False`` and
    you need to access the individual responses on demand.

    Returns:
        list[InvocationResponse]: The loaded responses. Also updates ``self.responses``
        in place.

    Raises:
        ValueError: If no output_path is set on this Result.
        FileNotFoundError: If 'responses.jsonl' is not found at the output_path.
    """
    if not self.output_path:
        raise ValueError(
            "No output_path set on this Result. Cannot locate responses file."
        )
    responses_path = ensure_path(self.output_path) / "responses.jsonl"
    with responses_path.open("r") as f:
        self.responses = [InvocationResponse.from_json(line) for line in f if line]
    logger.info("Loaded %d responses from %s", len(self.responses), responses_path)
    # Recompute stats from the freshly loaded responses
    self._preloaded_stats = self._compute_stats(self)
    return self.responses

save

save(output_path=None)

Save the results to disk or cloud storage.

Saves the run results to the specified output path or the instance's default output path. It creates three files: 1. 'summary.json': Contains the overall summary of the results. 2. 'stats.json': Contains detailed statistics of the run. 3. 'responses.jsonl': Contains individual invocation responses - Only if the responses are not already saved at the indicated path.

Parameters:

Name	Type	Description	Default
`output_path`	`UPath \| str \| None`	The path where the result files will be saved. If None, the instance's default output_path will be used. Defaults to None.	`None`

Raises:

Type	Description
`ValueError`	If no output path is provided and the instance doesn't have a default output_path set.
`TypeError`	If the provided output_path is not a valid type.
`IOError`	If there's an error writing to the output files.

Note

The method uses the Universal Path (UPath) library for file operations, which provides a unified interface for working with different file systems.

Source code in llmeter/results.py

def save(self, output_path: WritablePathLike | None = None):
    """
    Save the results to disk or cloud storage.

    Saves the run results to the specified output path or the
    instance's default output path. It creates three files:
    1. 'summary.json': Contains the overall summary of the results.
    2. 'stats.json': Contains detailed statistics of the run.
    3. 'responses.jsonl': Contains individual invocation responses
        - Only if the responses are not already saved at the indicated path.


    Args:
        output_path (UPath | str | None, optional): The path where the result
            files will be saved. If None, the instance's default output_path
            will be used. Defaults to None.

    Raises:
        ValueError: If no output path is provided and the instance doesn't
            have a default output_path set.
        TypeError: If the provided output_path is not a valid type.
        IOError: If there's an error writing to the output files.

    Note:
        The method uses the Universal Path (UPath) library for file operations,
        which provides a unified interface for working with different file systems.
    """

    output_path = ensure_path(self.output_path or output_path)
    if output_path is None:
        raise ValueError("No output path provided")

    output_path.mkdir(parents=True, exist_ok=True)

    summary_path = output_path / "summary.json"
    stats_path = output_path / "stats.json"
    with summary_path.open("w") as f, stats_path.open("w") as s:
        f.write(self.to_json(indent=4))
        s.write(
            json.dumps(self.stats, indent=4, default=llmeter_default_serializer)
        )

    responses_path = output_path / "responses.jsonl"
    if not responses_path.exists():
        with responses_path.open("w") as f:
            for response in self.responses:
                f.write(
                    json.dumps(asdict(response), default=llmeter_default_serializer)
                    + "\n"
                )

to_dict

to_dict(include_responses=False)

Return a dictionary representation of this result.

Returns a plain dict produced by :func:dataclasses.asdict, preserving native Python types (datetime, UPath, etc.). This is suitable for programmatic access and internal data processing.

For JSON output, use :meth:to_json which delegates to :func:~llmeter.json_utils.llmeter_default_serializer for non-serializable types, or pass the dict through json.dumps(result.to_dict(), default=llmeter_default_serializer).

Parameters:

Name	Type	Description	Default
`include_responses`	`bool`	If `True`, include the full list of :class:`~llmeter.endpoints.base.InvocationResponse` dicts and the `stats` key. Defaults to `False`.	`False`

Returns:

Name	Type	Description
`dict`		A dictionary of result fields with native Python types.

Source code in llmeter/results.py

def to_dict(self, include_responses: bool = False):
    """Return a dictionary representation of this result.

    Returns a plain ``dict`` produced by :func:`dataclasses.asdict`,
    preserving native Python types (``datetime``, ``UPath``, etc.).
    This is suitable for programmatic access and internal data
    processing.

    For JSON output, use :meth:`to_json` which delegates to
    :func:`~llmeter.json_utils.llmeter_default_serializer` for
    non-serializable types, or pass the dict through
    ``json.dumps(result.to_dict(), default=llmeter_default_serializer)``.

    Args:
        include_responses: If ``True``, include the full list of
            :class:`~llmeter.endpoints.base.InvocationResponse` dicts
            and the ``stats`` key.  Defaults to ``False``.

    Returns:
        dict: A dictionary of result fields with native Python types.
    """
    data = asdict(self)
    if include_responses:
        return data
    return {k: v for k, v in data.items() if k not in ["responses", "stats"]}

to_json

to_json(default=llmeter_default_serializer, **kwargs)

Return the results as a JSON string.

Parameters:

Name	Type	Description	Default
`default`		Fallback serializer. Defaults to :func:`~llmeter.json_utils.llmeter_default_serializer`.	`llmeter_default_serializer`
`**kwargs`		Extra keyword arguments passed to :func:`json.dumps`.	`{}`

Source code in llmeter/results.py

def to_json(self, default=llmeter_default_serializer, **kwargs):
    """Return the results as a JSON string.

    Args:
        default: Fallback serializer. Defaults to
            :func:`~llmeter.json_utils.llmeter_default_serializer`.
        **kwargs: Extra keyword arguments passed to :func:`json.dumps`.
    """
    summary = {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }
    return json.dumps(summary, default=default, **kwargs)

Results

results

Result dataclass

stats property

__post_init__

get_dimension

load classmethod

load_responses

save

to_dict

to_json

Result `dataclass`

stats `property`

load `classmethod`