Results

results

Result `dataclass`

Result(responses, total_requests, clients, n_requests, total_test_time=None, model_id=None, output_path=None, endpoint_name=None, provider=None, run_name=None, run_description=None, start_time=None, end_time=None)

Results of a test run.

stats `property`

stats

Run metrics and aggregated statistics over the individual requests

This combined view includes: - Basic information about the run (from the Result's dictionary representation) - Aggregated statistics ('average', 'p50', 'p90', 'p99') for: - Time to last token - Time to first token - Number of tokens output - Number of tokens input

Aggregated statistics are keyed in the format "{stat_name}-{aggregation_type}"

This property is read-only and returns a new shallow copy of the data on each access. Default stats provided by LLMeter are calculated on first access and then cached. Callbacks Callbacks or other mechanisms needing to augment stats should use the _update_contributed_stats() method.

When the Result was loaded with load_responses=False, pre-computed stats from stats.json are returned if available. Call load_responses() to load the individual responses and recompute stats from the raw data.

__post_init__

__post_init__()

Initialize the Result instance.

Source code in llmeter/results.py

def __post_init__(self):
    """Initialize the Result instance."""
    self._contributed_stats = {}
    if not hasattr(self, "_preloaded_stats"):
        self._preloaded_stats = None

get_dimension

get_dimension(dimension, filter_dimension=None, filter_value=None)

Get the values of a specific dimension from the responses.

Parameters:

Name	Type	Description	Default
`dimension`	`str`	The name of the dimension to retrieve.	required
`filter_dimension`	`str`	Name of dimension to filter on. Defaults to None.	`None`
`filter_value`	`any`	Value to match for the filter dimension. Defaults to None.	`None`

Returns:

Name	Type	Description
`list`	`list`	A list of values for the specified dimension across all responses.

Raises:

Type	Description
`ValueError`	If the specified dimension is not found in any response.

Source code in llmeter/results.py

def get_dimension(
    self,
    dimension: str,
    filter_dimension: str | None = None,
    filter_value: Any = None,
) -> list:
    """
    Get the values of a specific dimension from the responses.

    Args:
        dimension (str): The name of the dimension to retrieve.
        filter_dimension (str, optional): Name of dimension to filter on. Defaults to None.
        filter_value (any, optional): Value to match for the filter dimension. Defaults to None.

    Returns:
        list: A list of values for the specified dimension across all responses.

    Raises:
        ValueError: If the specified dimension is not found in any response.
    """
    if filter_dimension is not None:
        values = [
            getattr(response, dimension)
            for response in self.responses
            if getattr(response, filter_dimension) == filter_value
        ]
    else:
        values = [getattr(response, dimension) for response in self.responses]

    if not any(values):
        # raise ValueError(f"Dimension {dimension} not found in any response")
        logger.warning(f"Dimension {dimension} not found in any response")
    return values

load `classmethod`

load(result_path, load_responses=True)

Load run results from disk or cloud storage.

Reads previously saved run results from the specified path. It expects 'summary.json' to be present in the given directory. By default, also loads 'responses.jsonl' containing individual invocation responses.

Parameters:

Name	Type	Description	Default
`result_path`	`UPath \| str`	The path to the directory containing the result files. Can be a string or a UPath object.	required
`load_responses`	`bool`	Whether to load individual invocation responses from 'responses.jsonl'. Defaults to True. When False, only the summary and pre-computed stats are loaded, which is significantly faster for large result sets. Use `result.load_responses()` to load them on demand later.	`True`

Returns:

Name	Type	Description
`Result`	`Result`	An instance of the Result class containing the loaded
	`Result`	responses and summary data.

Raises:

Type	Description
`FileNotFoundError`	If required files are not found in the specified directory.
`JSONDecodeError`	If there's an issue parsing the JSON data in either file.

Source code in llmeter/results.py

@classmethod
def load(
    cls, result_path: os.PathLike | str, load_responses: bool = True
) -> "Result":
    """
    Load run results from disk or cloud storage.

    Reads previously saved run results from the specified
    path. It expects 'summary.json' to be present in the given directory.
    By default, also loads 'responses.jsonl' containing individual invocation
    responses.

    Args:
        result_path (UPath | str): The path to the directory containing the
            result files. Can be a string or a UPath object.
        load_responses (bool): Whether to load individual invocation responses
            from 'responses.jsonl'. Defaults to True. When False, only the
            summary and pre-computed stats are loaded, which is significantly
            faster for large result sets. Use ``result.load_responses()`` to
            load them on demand later.

    Returns:
        Result: An instance of the Result class containing the loaded
        responses and summary data.

    Raises:
        FileNotFoundError: If required files are not found in the specified
            directory.
        JSONDecodeError: If there's an issue parsing the JSON data in
            either file.

    """
    result_path = Path(result_path)
    summary_path = result_path / "summary.json"

    with summary_path.open("r") as g:
        summary = json.load(g)

    # Convert datetime strings back to datetime objects
    for key in ["start_time", "end_time"]:
        if key in summary and summary[key] and isinstance(summary[key], str):
            try:
                summary[key] = datetime.fromisoformat(summary[key])
            except ValueError:
                pass

    # Ensure output_path is set so load_responses() can find the files later
    if "output_path" not in summary or summary["output_path"] is None:
        summary["output_path"] = str(result_path)

    if load_responses:
        responses_path = result_path / "responses.jsonl"
        with responses_path.open("r") as f:
            responses = [
                InvocationResponse(**json.loads(line)) for line in f if line
            ]
    else:
        responses = []
        responses_path = result_path / "responses.jsonl"
        logger.info(
            "Loaded summary only (responses not loaded). "
            "Individual responses are stored at: %s. "
            "Call result.load_responses() to load them on demand.",
            responses_path,
        )

    result = cls(responses=responses, **summary)

    # When skipping responses, load pre-computed stats from stats.json if available
    # so that result.stats works without needing the responses
    if not load_responses:
        stats_path = result_path / "stats.json"
        if stats_path.exists():
            with stats_path.open("r") as s:
                result._preloaded_stats = json.loads(s.read())
                # Convert datetime strings in stats
                for key in ["start_time", "end_time"]:
                    val = result._preloaded_stats.get(key)
                    if val and isinstance(val, str):
                        try:
                            result._preloaded_stats[key] = datetime.fromisoformat(
                                val
                            )
                        except ValueError:
                            pass
        else:
            result._preloaded_stats = None

    return result

load_responses

load_responses()

Load individual invocation responses from disk or cloud storage.

Reads the 'responses.jsonl' file from the result's output_path directory. This is useful when the Result was loaded with load_responses=False and you need to access the individual responses on demand.

Returns:

Type	Description
`list[InvocationResponse]`	list[InvocationResponse]: The loaded responses. Also updates `self.responses`
`list[InvocationResponse]`	in place.

Raises:

Type	Description
`ValueError`	If no output_path is set on this Result.
`FileNotFoundError`	If 'responses.jsonl' is not found at the output_path.

Source code in llmeter/results.py

def load_responses(self) -> list[InvocationResponse]:
    """
    Load individual invocation responses from disk or cloud storage.

    Reads the 'responses.jsonl' file from the result's output_path directory.
    This is useful when the Result was loaded with ``load_responses=False`` and
    you need to access the individual responses on demand.

    Returns:
        list[InvocationResponse]: The loaded responses. Also updates ``self.responses``
        in place.

    Raises:
        ValueError: If no output_path is set on this Result.
        FileNotFoundError: If 'responses.jsonl' is not found at the output_path.
    """
    if not self.output_path:
        raise ValueError(
            "No output_path set on this Result. Cannot locate responses file."
        )
    responses_path = Path(self.output_path) / "responses.jsonl"
    with responses_path.open("r") as f:
        self.responses = [
            InvocationResponse(**json.loads(line)) for line in f if line
        ]
    logger.info("Loaded %d responses from %s", len(self.responses), responses_path)
    # Invalidate cached stats so they are recomputed with the loaded responses
    self.__dict__.pop("_builtin_stats", None)
    return self.responses

save

save(output_path=None)

Save the results to disk or cloud storage.

Saves the run results to the specified output path or the instance's default output path. It creates three files: 1. 'summary.json': Contains the overall summary of the results. 2. 'stats.json': Contains detailed statistics of the run. 3. 'responses.jsonl': Contains individual invocation responses - Only if the responses are not already saved at the indicated path.

Parameters:

Name	Type	Description	Default
`output_path`	`UPath \| str \| None`	The path where the result files will be saved. If None, the instance's default output_path will be used. Defaults to None.	`None`

Raises:

Type	Description
`ValueError`	If no output path is provided and the instance doesn't have a default output_path set.
`TypeError`	If the provided output_path is not a valid type.
`IOError`	If there's an error writing to the output files.

Note

The method uses the Universal Path (UPath) library for file operations, which provides a unified interface for working with different file systems.

Source code in llmeter/results.py

def save(self, output_path: os.PathLike | str | None = None):
    """
    Save the results to disk or cloud storage.

    Saves the run results to the specified output path or the
    instance's default output path. It creates three files:
    1. 'summary.json': Contains the overall summary of the results.
    2. 'stats.json': Contains detailed statistics of the run.
    3. 'responses.jsonl': Contains individual invocation responses
        - Only if the responses are not already saved at the indicated path.


    Args:
        output_path (UPath | str | None, optional): The path where the result
            files will be saved. If None, the instance's default output_path
            will be used. Defaults to None.

    Raises:
        ValueError: If no output path is provided and the instance doesn't
            have a default output_path set.
        TypeError: If the provided output_path is not a valid type.
        IOError: If there's an error writing to the output files.

    Note:
        The method uses the Universal Path (UPath) library for file operations,
        which provides a unified interface for working with different file systems.
    """

    try:
        output_path = Path(self.output_path or output_path)
    except TypeError:
        raise ValueError("No output path provided")

    output_path.mkdir(parents=True, exist_ok=True)

    summary_path = output_path / "summary.json"
    stats_path = output_path / "stats.json"
    with summary_path.open("w") as f, stats_path.open("w") as s:
        f.write(self.to_json(indent=4))
        s.write(json.dumps(self.stats, indent=4, default=utc_datetime_serializer))

    responses_path = output_path / "responses.jsonl"
    if not responses_path.exists():
        with responses_path.open("w") as f:
            for response in self.responses:
                f.write(json.dumps(asdict(response)) + "\n")

to_dict

to_dict(include_responses=False)

Return the results as a dictionary.

Source code in llmeter/results.py

def to_dict(self, include_responses: bool = False):
    """Return the results as a dictionary."""
    if include_responses:
        return asdict(self)
    return {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }

to_json

to_json(**kwargs)

Return the results as a JSON string.

Source code in llmeter/results.py

def to_json(self, **kwargs):
    """Return the results as a JSON string."""
    summary = {
        k: o for k, o in asdict(self).items() if k not in ["responses", "stats"]
    }
    return json.dumps(summary, default=utc_datetime_serializer, **kwargs)

utc_datetime_serializer

utc_datetime_serializer(obj)

Serialize datetime objects to UTC ISO format strings.

Parameters:

Name	Type	Description	Default
`obj`	`Any`	Object to serialize. If datetime, converts to ISO format string with 'Z' timezone. Otherwise returns string representation.	required

Returns:

Name	Type	Description
`str`	`str`	ISO format string with 'Z' timezone for datetime objects, or string representation for other objects.

Source code in llmeter/results.py

def utc_datetime_serializer(obj: Any) -> str:
    """
    Serialize datetime objects to UTC ISO format strings.

    Args:
        obj: Object to serialize. If datetime, converts to ISO format string with 'Z' timezone.
             Otherwise returns string representation.

    Returns:
        str: ISO format string with 'Z' timezone for datetime objects, or string representation
             for other objects.
    """
    if isinstance(obj, datetime):
        # Convert to UTC if timezone is set
        if obj.tzinfo is not None:
            obj = obj.astimezone(timezone.utc)
        return obj.isoformat(timespec="seconds").replace("+00:00", "Z")
    return str(obj)

Results

results

Result dataclass

stats property

__post_init__

get_dimension

load classmethod

load_responses

save

to_dict

to_json

utc_datetime_serializer

Result `dataclass`

stats `property`

load `classmethod`