Bedrock invoke

bedrock_invoke

BedrockInvoke

BedrockInvoke(model_id, endpoint_name=None, region=None, bedrock_boto3_client=None, max_attempts=3, generated_text_jmespath='choices[0].message.content', generated_token_count_jmespath='usage.completion_tokens', input_text_jmespath='messages[].content[].text', input_token_count_jmespath='usage.prompt_tokens')

Bases: Endpoint

LLMeter Endpoint for Amazon Bedrock InvokeModel API (non-streaming)

The default ..._jmespath parameters assume your target model uses an OpenAI ChatCompletions-like API, which is true for many (but not all) Bedrock models. You'll need to override these if targeting a model with different request/response format.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier for the model to use	required
`endpoint_name`	`str \| None`	Name of the endpoint. Defaults to None.	`None`
`region`	`str \| None`	AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.	`None`
`bedrock_boto3_client`	`Any`	Optional pre-configured boto3 client, otherwise one will be created.	`None`
`max_attempts`	`int`	Maximum number of retry attempts. Defaults to 3.	`3`
`generated_text_jmespath`	`str`	JMESPath query to extract generated text from model response.	`'choices[0].message.content'`
`generated_token_count_jmespath`	`str \| None`	JMESPath query to extract generated token count from model response.	`'usage.completion_tokens'`
`input_text_jmespath`	`str`	JMESPath query to extract input text from the model request payload.	`'messages[].content[].text'`
`input_token_count_jmespath`	`str \| None`	JMESPath query to extract input token count from the response.	`'usage.prompt_tokens'`

Source code in llmeter/endpoints/bedrock_invoke.py

def __init__(
    self,
    model_id: str,
    endpoint_name: str | None = None,
    region: str | None = None,
    bedrock_boto3_client: Any = None,
    max_attempts: int = 3,
    generated_text_jmespath: str = "choices[0].message.content",
    generated_token_count_jmespath: str | None = "usage.completion_tokens",
    input_text_jmespath: str = "messages[].content[].text",
    input_token_count_jmespath: str | None = "usage.prompt_tokens",
):
    """Create a BedrockInvoke Endpoint

    The default ..._jmespath parameters assume your target model uses an OpenAI
    ChatCompletions-like API, which is true for many (but not all) Bedrock models. You'll need
    to override these if targeting a model with different request/response format.

    Args:
        model_id:
            The identifier for the model to use
        endpoint_name:
            Name of the endpoint. Defaults to None.
        region:
            AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.
        bedrock_boto3_client:
            Optional pre-configured boto3 client, otherwise one will be created.
        max_attempts:
            Maximum number of retry attempts. Defaults to 3.
        generated_text_jmespath:
            JMESPath query to extract generated text from model response.
        generated_token_count_jmespath:
            JMESPath query to extract generated token count from model response.
        input_text_jmespath:
            JMESPath query to extract input text from the model request payload.
        input_token_count_jmespath:
            JMESPath query to extract input token count from the response.
    """
    super().__init__(
        model_id=model_id,
        endpoint_name=endpoint_name or "amazon bedrock",
        provider="bedrock",
    )

    self.generated_text_jmespath = generated_text_jmespath
    self.generated_token_count_jmespath = generated_token_count_jmespath
    self.input_text_jmespath = input_text_jmespath
    self.input_token_count_jmespath = input_token_count_jmespath

    self.region = (
        region
        or (bedrock_boto3_client and bedrock_boto3_client.meta.region_name)
        or boto3.session.Session().region_name
    )
    logger.info(f"Using AWS region: {self.region}")

    self._bedrock_client = bedrock_boto3_client
    if self._bedrock_client is None:
        config = Config(retries={"max_attempts": max_attempts, "mode": "standard"})
        self._bedrock_client = boto3.client(
            "bedrock-runtime", region_name=self.region, config=config
        )

create_payload `staticmethod`

create_payload(user_message, max_tokens=256, **kwargs)

Create a payload, assuming your target Bedrock model supports ChatCompletions-like API

Parameters:

Name	Type	Description	Default
`user_message`	`str \| list[str]`	The user's message or a sequence of messages.	required
`max_tokens`	`int \| None`	The maximum number of tokens to generate. Defaults to 256.	`256`
`**kwargs`	`Any`	Additional keyword arguments to include in the payload.	`{}`

Returns:

Name	Type	Description
`dict`	`dict`	The formatted payload for the Bedrock API request.

Raises:

Type	Description
`TypeError`	If user_message is not a string or list of strings
`ValueError`	If max_tokens is not a positive integer

Source code in llmeter/endpoints/bedrock_invoke.py

@staticmethod
def create_payload(
    user_message: str | list[str], max_tokens: int | None = 256, **kwargs: Any
) -> dict:
    """
    Create a payload, assuming your target Bedrock model supports ChatCompletions-like API

    Args:
        user_message: The user's message or a sequence of messages.
        max_tokens: The maximum number of tokens to generate. Defaults to 256.
        **kwargs: Additional keyword arguments to include in the payload.

    Returns:
        dict: The formatted payload for the Bedrock API request.

    Raises:
        TypeError: If user_message is not a string or list of strings
        ValueError: If max_tokens is not a positive integer
    """
    if not isinstance(user_message, (str, list)):
        raise TypeError("user_message must be a string or list of strings")

    if isinstance(user_message, list):
        if not all(isinstance(msg, str) for msg in user_message):
            raise TypeError("All messages must be strings")
        if not user_message:
            raise ValueError("user_message list cannot be empty")

    if not isinstance(max_tokens, int) or max_tokens <= 0:
        raise ValueError("max_tokens must be a positive integer")

    if isinstance(user_message, str):
        user_message = [user_message]

    try:
        payload: dict = {
            "messages": [
                {"role": "user", "content": [{"text": k, "type": "text"}]}
                for k in user_message
            ],
        }

        if max_tokens:
            payload["max_tokens"] = max_tokens

        payload.update(kwargs)
        return payload

    except Exception as e:
        logger.exception("Failed to create InvokeModel payload")
        raise RuntimeError(f"Failed to create payload: {str(e)}") from e

invoke

invoke(payload)

Invoke the Bedrock InvokeModel API with the given payload.

Parameters:

Name	Type	Description	Default
`payload`	`dict`	The payload containing the request parameters	required

Returns:

Name	Type	Description
`InvocationResponse`	`InvocationResponse`	Response object containing generated text and metadata

Raises:

Type	Description
`ClientError`	If there is an error calling the Bedrock API
`ValueError`	If payload is invalid
`TypeError`	If payload is not a dictionary

Source code in llmeter/endpoints/bedrock_invoke.py

def invoke(self, payload: dict) -> InvocationResponse:
    """Invoke the Bedrock InvokeModel API with the given payload.

    Args:
        payload (dict): The payload containing the request parameters

    Returns:
        InvocationResponse: Response object containing generated text and metadata

    Raises:
        ClientError: If there is an error calling the Bedrock API
        ValueError: If payload is invalid
        TypeError: If payload is not a dictionary
    """
    if not isinstance(payload, dict):
        raise TypeError("Payload must be a dictionary")

    try:
        req_body = json.dumps(payload).encode("utf-8")
        try:
            start_t = time.perf_counter()
            client_response = self._bedrock_client.invoke_model(  # type: ignore
                accept="application/json",
                body=req_body,
                contentType="application/json",
                modelId=self.model_id,
                # TODO: Provide config for other optional arguments
                # trace, guardrailIdentifier/Version, performanceConfigLatency, serviceTier
            )
            time_to_last_token = time.perf_counter() - start_t
        except ClientError as e:
            logger.error(f"Bedrock API error: {e}")
            return InvocationResponse.error_output(
                input_payload=payload, id=uuid4().hex, error=str(e)
            )
        except Exception as e:
            logger.error(f"Unexpected error during API call: {e}")
            return InvocationResponse.error_output(
                input_payload=payload, id=uuid4().hex, error=str(e)
            )

        response = self._parse_response(client_response)  # type: ignore
        response.input_payload = payload
        response.input_prompt = self._parse_payload(payload)
        response.time_to_last_token = time_to_last_token
        return response

    except Exception as e:
        logger.error(f"Error in invoke method: {e}")
        return InvocationResponse.error_output(
            input_payload=payload, id=uuid4().hex, error=str(e)
        )

BedrockInvokeStream

BedrockInvokeStream(model_id, endpoint_name=None, region=None, bedrock_boto3_client=None, max_attempts=3, generated_text_jmespath='choices[0].delta.content', generated_token_count_jmespath='"amazon-bedrock-invocationMetrics".outputTokenCount', input_text_jmespath='messages[].content[].text', input_token_count_jmespath='"amazon-bedrock-invocationMetrics".inputTokenCount')

Bases: BedrockInvoke

LLMeter Endpoint for Amazon Bedrock InvokeModelWithResponseStream API

The default ..._jmespath parameters assume your target model uses an OpenAI ChatCompletions-like streaming API, which is true for many (but not all) Bedrock models. You'll need to override these if targeting a model with different request/response format.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier for the model to use	required
`endpoint_name`	`str \| None`	Name of the endpoint. Defaults to None.	`None`
`region`	`str \| None`	AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.	`None`
`bedrock_boto3_client`	`Any`	Optional pre-configured boto3 client, otherwise one will be created.	`None`
`max_attempts`	`int`	Maximum number of retry attempts. Defaults to 3.	`3`
`generated_text_jmespath`	`str`	JMESPath query to extract incremental text from a chunk of the model response.	`'choices[0].delta.content'`
`generated_token_count_jmespath`	`str \| None`	JMESPath query to extract generated token count from a chunk of model response.	`'"amazon-bedrock-invocationMetrics".outputTokenCount'`
`input_text_jmespath`	`str`	JMESPath query to extract input text from the model request payload.	`'messages[].content[].text'`
`input_token_count_jmespath`	`str \| None`	JMESPath query to extract input token count from a chunk of the model response.	`'"amazon-bedrock-invocationMetrics".inputTokenCount'`

Source code in llmeter/endpoints/bedrock_invoke.py

def __init__(
    self,
    model_id: str,
    endpoint_name: str | None = None,
    region: str | None = None,
    bedrock_boto3_client: Any = None,
    max_attempts: int = 3,
    generated_text_jmespath: str = "choices[0].delta.content",
    generated_token_count_jmespath: str
    | None = '"amazon-bedrock-invocationMetrics".outputTokenCount',
    input_text_jmespath: str = "messages[].content[].text",
    input_token_count_jmespath: str
    | None = '"amazon-bedrock-invocationMetrics".inputTokenCount',
):
    """Create a BedrockInvokeStream Endpoint

    The default ..._jmespath parameters assume your target model uses an OpenAI
    ChatCompletions-like streaming API, which is true for many (but not all) Bedrock models.
    You'll need to override these if targeting a model with different request/response format.

    Args:
        model_id:
            The identifier for the model to use
        endpoint_name:
            Name of the endpoint. Defaults to None.
        region:
            AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.
        bedrock_boto3_client:
            Optional pre-configured boto3 client, otherwise one will be created.
        max_attempts:
            Maximum number of retry attempts. Defaults to 3.
        generated_text_jmespath:
            JMESPath query to extract incremental text from *a chunk of* the model response.
        generated_token_count_jmespath:
            JMESPath query to extract generated token count from *a chunk of* model response.
        input_text_jmespath:
            JMESPath query to extract input text from the model request payload.
        input_token_count_jmespath:
            JMESPath query to extract input token count from *a chunk of* the model response.
    """
    super().__init__(
        model_id=model_id,
        endpoint_name=endpoint_name,
        region=region,
        bedrock_boto3_client=bedrock_boto3_client,
        max_attempts=max_attempts,
        generated_text_jmespath=generated_text_jmespath,
        generated_token_count_jmespath=generated_token_count_jmespath,
        input_text_jmespath=input_text_jmespath,
        input_token_count_jmespath=input_token_count_jmespath,
    )

Bedrock invoke

bedrock_invoke

BedrockInvoke

create_payload staticmethod

invoke

BedrockInvokeStream

create_payload `staticmethod`