Amazon SageMaker endpoints
Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. SageMaker endpoints are used to host ML models for real-time inference use cases. For more information, visit the AWS documentation here.
Info
The current implementation uses JSON to encode the data sent in requests and decode the data received in responses.
Please ensure that your SageMaker endpoint can handle requests and responses with the Content-Type and Accept headers
set to application/json
before proceeding.
Prerequisites
The principal must have the following permissions:
Configurations
target:
type: sagemaker-endpoint
endpoint_name: my-endpoint-name
request_body:
input_text: None
temperature: 0.1
input_path: $.input_text
output_path: $.[0].generated_text
custom_attributes: my-attributes
target_model: my-model
target_variant: my-variant
target_container_hostname: my-hostname
inference_component_name: my-component-name
endpoint_name
(string)
The name of the Amazon SageMaker endpoint.
request_body
(map)
The data that is sent to the endpoint, which includes a placeholder for the prompt. During a run, the placeholder will be replaced by a prompt generated by the Evaluator. For example:
request_body:
input_text: None # prompt
temperature: 0.1
input_path
(string)
A JSONPath expression to match the field for the input prompt in the request body. For the request_body
below:
request_body:
input_text: None # prompt
temperature: 0.1
The input_path
would be $.input_text
.
output_path
(string)
A JSONPath expression to match the generated text in the response body. For example, if the endpoint returns the following:
[{ "generated_text": "Hello!" }]
The output_path
would be $.[0].generated_text
.
custom_attributes
(string; optional)
Provides additional information about a request for an inference submitted to a model hosted at an Amazon SageMaker endpoint.
target_model
(string; optional)
The model to request for inference when invoking a multi-model endpoint.
target_variant
(string; optional)
The production variant to send the inference request to when invoking an endpoint that is running two or more variants.
target_container_hostname
(string; optional)
The hostname of the container to invoke if the endpoint hosts multiple containers and is configured to use direct invocation.
inference_component_name
(string; optional)
The name of the inference component to invoke if the endpoint hosts one or more inference components.