Skip to content

Bedrock invoke

bedrock_invoke

BedrockInvoke

BedrockInvoke(model_id, endpoint_name=None, region=None, bedrock_boto3_client=None, max_attempts=3, generated_text_jmespath='choices[0].message.content', generated_token_count_jmespath='usage.completion_tokens', input_text_jmespath='messages[].content[].text', input_token_count_jmespath='usage.prompt_tokens')

Bases: Endpoint

LLMeter Endpoint for Amazon Bedrock InvokeModel API (non-streaming)

The default ..._jmespath parameters assume your target model uses an OpenAI ChatCompletions-like API, which is true for many (but not all) Bedrock models. You'll need to override these if targeting a model with different request/response format.

Parameters:

Name Type Description Default
model_id str

The identifier for the model to use

required
endpoint_name str | None

Name of the endpoint. Defaults to None.

None
region str | None

AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.

None
bedrock_boto3_client Any

Optional pre-configured boto3 client, otherwise one will be created.

None
max_attempts int

Maximum number of retry attempts. Defaults to 3.

3
generated_text_jmespath str

JMESPath query to extract generated text from model response.

'choices[0].message.content'
generated_token_count_jmespath str | None

JMESPath query to extract generated token count from model response.

'usage.completion_tokens'
input_text_jmespath str

JMESPath query to extract input text from the model request payload.

'messages[].content[].text'
input_token_count_jmespath str | None

JMESPath query to extract input token count from the response.

'usage.prompt_tokens'
Source code in llmeter/endpoints/bedrock_invoke.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def __init__(
    self,
    model_id: str,
    endpoint_name: str | None = None,
    region: str | None = None,
    bedrock_boto3_client: Any = None,
    max_attempts: int = 3,
    generated_text_jmespath: str = "choices[0].message.content",
    generated_token_count_jmespath: str | None = "usage.completion_tokens",
    input_text_jmespath: str = "messages[].content[].text",
    input_token_count_jmespath: str | None = "usage.prompt_tokens",
):
    """Create a BedrockInvoke Endpoint

    The default ..._jmespath parameters assume your target model uses an OpenAI
    ChatCompletions-like API, which is true for many (but not all) Bedrock models. You'll need
    to override these if targeting a model with different request/response format.

    Args:
        model_id:
            The identifier for the model to use
        endpoint_name:
            Name of the endpoint. Defaults to None.
        region:
            AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.
        bedrock_boto3_client:
            Optional pre-configured boto3 client, otherwise one will be created.
        max_attempts:
            Maximum number of retry attempts. Defaults to 3.
        generated_text_jmespath:
            JMESPath query to extract generated text from model response.
        generated_token_count_jmespath:
            JMESPath query to extract generated token count from model response.
        input_text_jmespath:
            JMESPath query to extract input text from the model request payload.
        input_token_count_jmespath:
            JMESPath query to extract input token count from the response.
    """
    super().__init__(
        model_id=model_id,
        endpoint_name=endpoint_name or "amazon bedrock",
        provider="bedrock",
    )

    self.generated_text_jmespath = generated_text_jmespath
    self.generated_token_count_jmespath = generated_token_count_jmespath
    self.input_text_jmespath = input_text_jmespath
    self.input_token_count_jmespath = input_token_count_jmespath

    self.region = (
        region
        or (bedrock_boto3_client and bedrock_boto3_client.meta.region_name)
        or boto3.session.Session().region_name
    )
    logger.info(f"Using AWS region: {self.region}")

    self._bedrock_client = bedrock_boto3_client
    if self._bedrock_client is None:
        config = Config(retries={"max_attempts": max_attempts, "mode": "standard"})
        self._bedrock_client = boto3.client(
            "bedrock-runtime", region_name=self.region, config=config
        )

create_payload staticmethod

create_payload(user_message, max_tokens=256, **kwargs)

Create a payload, assuming your target Bedrock model supports ChatCompletions-like API

Parameters:

Name Type Description Default
user_message str | list[str]

The user's message or a sequence of messages.

required
max_tokens int | None

The maximum number of tokens to generate. Defaults to 256.

256
**kwargs Any

Additional keyword arguments to include in the payload.

{}

Returns:

Name Type Description
dict dict

The formatted payload for the Bedrock API request.

Raises:

Type Description
TypeError

If user_message is not a string or list of strings

ValueError

If max_tokens is not a positive integer

Source code in llmeter/endpoints/bedrock_invoke.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
@staticmethod
def create_payload(
    user_message: str | list[str], max_tokens: int | None = 256, **kwargs: Any
) -> dict:
    """
    Create a payload, assuming your target Bedrock model supports ChatCompletions-like API

    Args:
        user_message: The user's message or a sequence of messages.
        max_tokens: The maximum number of tokens to generate. Defaults to 256.
        **kwargs: Additional keyword arguments to include in the payload.

    Returns:
        dict: The formatted payload for the Bedrock API request.

    Raises:
        TypeError: If user_message is not a string or list of strings
        ValueError: If max_tokens is not a positive integer
    """
    if not isinstance(user_message, (str, list)):
        raise TypeError("user_message must be a string or list of strings")

    if isinstance(user_message, list):
        if not all(isinstance(msg, str) for msg in user_message):
            raise TypeError("All messages must be strings")
        if not user_message:
            raise ValueError("user_message list cannot be empty")

    if not isinstance(max_tokens, int) or max_tokens <= 0:
        raise ValueError("max_tokens must be a positive integer")

    if isinstance(user_message, str):
        user_message = [user_message]

    try:
        payload: dict = {
            "messages": [
                {"role": "user", "content": [{"text": k, "type": "text"}]}
                for k in user_message
            ],
        }

        if max_tokens:
            payload["max_tokens"] = max_tokens

        payload.update(kwargs)
        return payload

    except Exception as e:
        logger.exception("Failed to create InvokeModel payload")
        raise RuntimeError(f"Failed to create payload: {str(e)}") from e

invoke

invoke(payload)

Invoke the Bedrock InvokeModel API with the given payload.

Parameters:

Name Type Description Default
payload dict

The payload containing the request parameters

required

Returns:

Name Type Description
InvocationResponse InvocationResponse

Response object containing generated text and metadata

Raises:

Type Description
ClientError

If there is an error calling the Bedrock API

ValueError

If payload is invalid

TypeError

If payload is not a dictionary

Source code in llmeter/endpoints/bedrock_invoke.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def invoke(self, payload: dict) -> InvocationResponse:
    """Invoke the Bedrock InvokeModel API with the given payload.

    Args:
        payload (dict): The payload containing the request parameters

    Returns:
        InvocationResponse: Response object containing generated text and metadata

    Raises:
        ClientError: If there is an error calling the Bedrock API
        ValueError: If payload is invalid
        TypeError: If payload is not a dictionary
    """
    if not isinstance(payload, dict):
        raise TypeError("Payload must be a dictionary")

    try:
        req_body = json.dumps(payload).encode("utf-8")
        try:
            start_t = time.perf_counter()
            client_response = self._bedrock_client.invoke_model(  # type: ignore
                accept="application/json",
                body=req_body,
                contentType="application/json",
                modelId=self.model_id,
                # TODO: Provide config for other optional arguments
                # trace, guardrailIdentifier/Version, performanceConfigLatency, serviceTier
            )
            time_to_last_token = time.perf_counter() - start_t
        except ClientError as e:
            logger.error(f"Bedrock API error: {e}")
            return InvocationResponse.error_output(
                input_payload=payload, id=uuid4().hex, error=str(e)
            )
        except Exception as e:
            logger.error(f"Unexpected error during API call: {e}")
            return InvocationResponse.error_output(
                input_payload=payload, id=uuid4().hex, error=str(e)
            )

        response = self._parse_response(client_response)  # type: ignore
        response.input_payload = payload
        response.input_prompt = self._parse_payload(payload)
        response.time_to_last_token = time_to_last_token
        return response

    except Exception as e:
        logger.error(f"Error in invoke method: {e}")
        return InvocationResponse.error_output(
            input_payload=payload, id=uuid4().hex, error=str(e)
        )

BedrockInvokeStream

BedrockInvokeStream(model_id, endpoint_name=None, region=None, bedrock_boto3_client=None, max_attempts=3, generated_text_jmespath='choices[0].delta.content', generated_token_count_jmespath='"amazon-bedrock-invocationMetrics".outputTokenCount', input_text_jmespath='messages[].content[].text', input_token_count_jmespath='"amazon-bedrock-invocationMetrics".inputTokenCount')

Bases: BedrockInvoke

LLMeter Endpoint for Amazon Bedrock InvokeModelWithResponseStream API

The default ..._jmespath parameters assume your target model uses an OpenAI ChatCompletions-like streaming API, which is true for many (but not all) Bedrock models. You'll need to override these if targeting a model with different request/response format.

Parameters:

Name Type Description Default
model_id str

The identifier for the model to use

required
endpoint_name str | None

Name of the endpoint. Defaults to None.

None
region str | None

AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.

None
bedrock_boto3_client Any

Optional pre-configured boto3 client, otherwise one will be created.

None
max_attempts int

Maximum number of retry attempts. Defaults to 3.

3
generated_text_jmespath str

JMESPath query to extract incremental text from a chunk of the model response.

'choices[0].delta.content'
generated_token_count_jmespath str | None

JMESPath query to extract generated token count from a chunk of model response.

'"amazon-bedrock-invocationMetrics".outputTokenCount'
input_text_jmespath str

JMESPath query to extract input text from the model request payload.

'messages[].content[].text'
input_token_count_jmespath str | None

JMESPath query to extract input token count from a chunk of the model response.

'"amazon-bedrock-invocationMetrics".inputTokenCount'
Source code in llmeter/endpoints/bedrock_invoke.py
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
def __init__(
    self,
    model_id: str,
    endpoint_name: str | None = None,
    region: str | None = None,
    bedrock_boto3_client: Any = None,
    max_attempts: int = 3,
    generated_text_jmespath: str = "choices[0].delta.content",
    generated_token_count_jmespath: str
    | None = '"amazon-bedrock-invocationMetrics".outputTokenCount',
    input_text_jmespath: str = "messages[].content[].text",
    input_token_count_jmespath: str
    | None = '"amazon-bedrock-invocationMetrics".inputTokenCount',
):
    """Create a BedrockInvokeStream Endpoint

    The default ..._jmespath parameters assume your target model uses an OpenAI
    ChatCompletions-like streaming API, which is true for many (but not all) Bedrock models.
    You'll need to override these if targeting a model with different request/response format.

    Args:
        model_id:
            The identifier for the model to use
        endpoint_name:
            Name of the endpoint. Defaults to None.
        region:
            AWS region to use. Defaults to bedrock_boto3_client's, or configured from AWS CLI.
        bedrock_boto3_client:
            Optional pre-configured boto3 client, otherwise one will be created.
        max_attempts:
            Maximum number of retry attempts. Defaults to 3.
        generated_text_jmespath:
            JMESPath query to extract incremental text from *a chunk of* the model response.
        generated_token_count_jmespath:
            JMESPath query to extract generated token count from *a chunk of* model response.
        input_text_jmespath:
            JMESPath query to extract input text from the model request payload.
        input_token_count_jmespath:
            JMESPath query to extract input token count from *a chunk of* the model response.
    """
    super().__init__(
        model_id=model_id,
        endpoint_name=endpoint_name,
        region=region,
        bedrock_boto3_client=bedrock_boto3_client,
        max_attempts=max_attempts,
        generated_text_jmespath=generated_text_jmespath,
        generated_token_count_jmespath=generated_token_count_jmespath,
        input_text_jmespath=input_text_jmespath,
        input_token_count_jmespath=input_token_count_jmespath,
    )