sagemaker

Utilities for automating cost modelling on Amazon SageMaker endpoints

SageMakerRTEndpointCompute `dataclass`

SageMakerRTEndpointCompute(instance_type, instance_count=1, price_per_hour=None, region=None)

Bases: RunCostDimensionBase

Run cost dimension to estimate Amazon SageMaker real-time endpoint compute charges

NOTE: To auto-discover price_per_hour from instance type, you'll need IAM permissions to call the pricing:GetProducts API. For more information, see: https://docs.aws.amazon.com/service-authorization/latest/reference/list_awspricelist.html

Calculated rates do not currently include EBS storage volume costs. See SageMakerRTEndpointStorage for estimating this.

See .fetch_sm_hosting_od_price() for more details on how price is looked up when not explicitly provided. This lookup is provided on a best-effort basis, and may not accurately reflect all possible scenarios.

Use .from_endpoint() to automatically discover compute cost dimensions from a deployed SageMaker real-time inference endpoint.

Parameters:

Name	Type	Description	Default
`instance_type`	`str`	Amazon SageMaker instance type e.g. 'ml.g5.4xlarge'	required
`instance_count`	`float`	Number of instances running (default: 1)	`1`
`price_per_hour`	`float \| None`	Price per hour per instance (default: attempt to fetch from pricing API)	`None`
`region`	`str \| None`	AWS region where the endpoint is running (default: current region)	`None`

fetch_sm_hosting_od_price `staticmethod`

fetch_sm_hosting_od_price(instance_type, region)

Look up USD hourly rates for on-demand SM hosting instances from the AWS Price List API

This function assumes:

You're using standard "on-demand" pricing - no savings plans or private pricing
No free tier or volume discounts are applicable to this usage
Your pricing is provided in USD

Parameters:

Name	Type	Description	Default
`instance_type`	`str`	Amazon SageMaker instance type, e.g. 'ml.g5.4xlarge'	required
`region`	`str`	AWS region where the endpoint is running, e.g. 'us-east-1'	required

Returns:

Name	Type	Description
`price_per_hour`	`float`	The standard, on-demand hourly price for the given instance type in USD

Source code in llmeter/callbacks/cost/providers/sagemaker.py

@staticmethod
def fetch_sm_hosting_od_price(instance_type: str, region: str) -> float:
    """Look up USD hourly rates for on-demand SM hosting instances from the AWS Price List API

    This function assumes:

    1. You're using standard "on-demand" pricing - no savings plans or private pricing
    2. No free tier or volume discounts are applicable to this usage
    3. Your pricing is provided in USD

    Args:
        instance_type: Amazon SageMaker instance type, e.g. 'ml.g5.4xlarge'
        region: AWS region where the endpoint is running, e.g. 'us-east-1'

    Returns:
        price_per_hour: The standard, on-demand hourly price for the given instance type in USD
    """
    unit_price, _ = _fetch_single_product_ondemand_unit_price(
        service_code="AmazonSageMaker",
        attribute_values={"component": "Hosting", "instanceName": instance_type},
        region=region,
        dim_unit="Hrs",
    )
    return unit_price

from_endpoint `classmethod`

from_endpoint(endpoint_name, region=None)

Configure SageMakerRTEndpointCompute dimension(s) from an existing SageMaker endpoint

NOTE: You'll need IAM permissions to sagemaker:DescribeEndpoint and sagemaker:DescribeEndpointConfig to use this method.

This function returns a dictionary rather than a single SageMakerRTEndpointCompute, because different "variants" deployed behind an endpoint may be backed by clusters of different instance types, and therefore need separate dimensions.

Instance counts will be retrieved at the point in time this method is called, so watch out if you have auto-scaling enabled on your endpoint.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	Name of the SageMaker endpoint	required
`region`	`str \| None`	AWS region where the endpoint is running (default: current region)	`None`

Returns:

Name	Type	Description
`run_dims`	`dict[str, SageMakerRTEndpointCompute]`	A dictionary containing one or more dimensions, to pass to your `CostModel(run_dims=...)`. If the endpoint has only one "Variant", the returned dict will have a single key (cost dimension name) `SageMakerRTEndpointCompute`. Otherwise, keys will be generated as `{variant_name}SageMakerRTEndpointCompute`

Source code in llmeter/callbacks/cost/providers/sagemaker.py

@classmethod
def from_endpoint(
    cls, endpoint_name: str, region: str | None = None
) -> dict[str, SageMakerRTEndpointCompute]:
    """Configure SageMakerRTEndpointCompute dimension(s) from an existing SageMaker endpoint

    NOTE: You'll need IAM permissions to `sagemaker:DescribeEndpoint` and
    `sagemaker:DescribeEndpointConfig` to use this method.

    This function returns a dictionary rather than a single `SageMakerRTEndpointCompute`,
    because different "variants" deployed behind an endpoint may be backed by clusters of
    different instance types, and therefore need separate dimensions.

    Instance counts will be retrieved at the point in time this method is called, so watch out
    if you have auto-scaling enabled on your endpoint.

    Args:
        endpoint_name: Name of the SageMaker endpoint
        region: AWS region where the endpoint is running (default: current region)

    Returns:
        run_dims: A dictionary containing one or more dimensions, to pass to your
            `CostModel(run_dims=...)`. If the endpoint has only one "Variant", the returned
            dict will have a single key (cost dimension name) `SageMakerRTEndpointCompute`.
            Otherwise, keys will be generated as `{variant_name}SageMakerRTEndpointCompute`
    """
    sagemaker = boto3.client("sagemaker", region_name=region)
    endpoint_desc = sagemaker.describe_endpoint(EndpointName=endpoint_name)
    ep_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_desc["ProductionVariants"]
    }
    endpoint_config = sagemaker.describe_endpoint_config(
        EndpointConfigName=endpoint_desc["EndpointConfigName"]
    )
    cfg_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_config["ProductionVariants"]
    }

    if len(ep_vars) == 1:
        variant = next(iter(ep_vars.values()))
        variant_config = cfg_vars[variant["VariantName"]]
        return {
            cls.__name__: cls(
                instance_type=variant_config["InstanceType"],
                instance_count=variant["CurrentInstanceCount"],
                region=sagemaker.meta.region_name,
            )
        }
    else:
        return {
            f"{var_name}_{cls.__name__}": cls(
                instance_type=cfg_vars[var_name]["InstanceType"],
                instance_count=ep_vars[var_name]["CurrentInstanceCount"],
                region=sagemaker.meta.region_name,
            )
            for var_name in ep_vars.keys()
        }

SageMakerRTEndpointStorage `dataclass`

SageMakerRTEndpointStorage(gbs_provisioned, price_per_gb_hour=None, region=None)

Bases: RunCostDimensionBase

Run cost dimension to estimate EBS charges for Amazon SageMaker real-time endpoints

NOTE: To auto-discover price_per_gb_hour, you'll need IAM permissions to call the pricing:GetProducts API. For more information, see: https://docs.aws.amazon.com/service-authorization/latest/reference/list_awspricelist.html

See .fetch_sm_hosting_ebs_price() for more details on how price is looked up when not explicitly provided. This lookup is provided on a best-effort basis, and may not accurately reflect all possible scenarios.

Parameters:

Name	Type	Description	Default
`gbs_provisioned`	`float`	Total size of provisioned EBS volume(s) for the endpoint in Gigabytes	required
`price_per_gb_hour`	`float \| None`	Price per hour per GB (default: attempt to fetch from pricing API)	`None`
`region`	`str \| None`	AWS region where the endpoint is running (default: current region)	`None`

fetch_sm_hosting_ebs_price `staticmethod`

fetch_sm_hosting_ebs_price(region)

Look up hourly USD rate for SageMaker hosting EBS storage from the AWS Price List API

The API actually lists this rate as monthly, so we take an assumption of 30days * 24hrs to convert.

Parameters:

Name	Type	Description	Default
`region`	`str`	AWS region where the endpoint is running, e.g. 'us-east-1'	required

Returns:

Name	Type	Description
`price_per_gb_hour`	`float`	The standard, on-demand price per GB-hour in USD

Source code in llmeter/callbacks/cost/providers/sagemaker.py

@staticmethod
def fetch_sm_hosting_ebs_price(region: str) -> float:
    """Look up hourly USD rate for SageMaker hosting EBS storage from the AWS Price List API

    The API actually lists this rate as monthly, so we take an assumption of 30days * 24hrs to
    convert.

    Args:
        region: AWS region where the endpoint is running, e.g. 'us-east-1'

    Returns:
        price_per_gb_hour: The standard, on-demand price per GB-hour in USD
    """
    unit_price, _ = _fetch_single_product_ondemand_unit_price(
        service_code="AmazonSageMaker",
        attribute_values={"volumeType": "General Purpose-Hosting"},
        region=region,
        dim_unit="GB-Mo",
    )

    return unit_price / (30 * 24)  # Convert GB-mo to GB-hr

from_endpoint `classmethod`

from_endpoint(endpoint_name, region=None, merge_variants=False)

Configure SageMakerRTEndpointStorage dimension(s) from an existing SageMaker endpoint

NOTE: You'll need IAM permissions to sagemaker:DescribeEndpoint and sagemaker:DescribeEndpointConfig to use this method.

This function returns a dictionary rather than a single SageMakerRTEndpointStorage, because different "variants" deployed behind an endpoint may be reported separately, if more than one are present.

Instance counts will be retrieved at the point in time this method is called, so watch out if you have auto-scaling enabled on your endpoint.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	Name of the SageMaker endpoint	required
`region`	`str \| None`	AWS region where the endpoint is running (default: current region)	`None`
`merge_variants`	`bool`	Set `True` to merge multiple "variants" into a single storage dimension (since the hourly rate is constant across instance types). By default, if multiple variants are configured these will be reported as separate cost dimensions.	`False`

Returns:

Name	Type	Description
`run_dims`	`dict[str, SageMakerRTEndpointStorage]`	A dictionary containing zero or more dimensions, to pass to your `CostModel(run_dims=...)`. If the endpoint has only one "Variant", the returned dict will have a single key (cost dimension name) `SageMakerRTEndpointCompute`. Otherwise, keys will be generated as `{variant_name}SageMakerRTEndpointCompute`. Any variants with 0 EBS storage configured will be omitted from the result.

Source code in llmeter/callbacks/cost/providers/sagemaker.py

@classmethod
def from_endpoint(
    cls,
    endpoint_name: str,
    region: str | None = None,
    merge_variants: bool = False,
) -> dict[str, SageMakerRTEndpointStorage]:
    """Configure SageMakerRTEndpointStorage dimension(s) from an existing SageMaker endpoint

    NOTE: You'll need IAM permissions to `sagemaker:DescribeEndpoint` and
    `sagemaker:DescribeEndpointConfig` to use this method.

    This function returns a dictionary rather than a single `SageMakerRTEndpointStorage`,
    because different "variants" deployed behind an endpoint may be reported separately, if
    more than one are present.

    Instance counts will be retrieved at the point in time this method is called, so watch out
    if you have auto-scaling enabled on your endpoint.

    Args:
        endpoint_name: Name of the SageMaker endpoint
        region: AWS region where the endpoint is running (default: current region)
        merge_variants: Set `True` to merge multiple "variants" into a single storage dimension
            (since the hourly rate is constant across instance types). By default, if multiple
            variants are configured these will be reported as separate cost dimensions.

    Returns:
        run_dims: A dictionary containing zero or more dimensions, to pass to your
            `CostModel(run_dims=...)`. If the endpoint has only one "Variant", the returned
            dict will have a single key (cost dimension name) `SageMakerRTEndpointCompute`.
            Otherwise, keys will be generated as `{variant_name}SageMakerRTEndpointCompute`.
            Any variants with 0 EBS storage configured will be omitted from the result.
    """
    sagemaker = boto3.client("sagemaker", region_name=region)
    region_final = sagemaker.meta.region_name
    endpoint_desc = sagemaker.describe_endpoint(EndpointName=endpoint_name)
    ep_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_desc["ProductionVariants"]
    }
    endpoint_config = sagemaker.describe_endpoint_config(
        EndpointConfigName=endpoint_desc["EndpointConfigName"]
    )
    cfg_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_config["ProductionVariants"]
    }

    # variant.VolumeSizeInGB seems to be set automatically for EBS-enabled instance types, but
    # missing for non-EBS (storage-included) instance types.
    vars_with_ebs = [
        name for name, cfg in cfg_vars.items() if cfg.get("VolumeSizeInGB")
    ]

    if len(vars_with_ebs) == 0:
        return {}
    elif len(ep_vars) == 1:
        variant = next(iter(ep_vars.values()))
        variant_config = cfg_vars[variant["VariantName"]]
        return {
            cls.__name__: cls(
                gbs_provisioned=variant_config["VolumeSizeInGB"]
                * variant["CurrentInstanceCount"],
                region=region_final,
            )
        }
    elif merge_variants:
        total_gbs = sum(
            [
                cfg_vars[var_name].get("VolumeSizeInGB", 0)
                * ep_vars[var_name]["CurrentInstanceCount"]
                for var_name in ep_vars.keys()
            ]
        )
        return {cls.__name__: cls(gbs_provisioned=total_gbs, region=region_final)}
    else:
        return {
            f"{var_name}_{cls.__name__}": cls(
                gbs_provisioned=cfg_vars[var_name]["VolumeSizeInGB"]
                * ep_vars[var_name]["CurrentInstanceCount"],
                region=region_final,
            )
            for var_name in ep_vars.keys()
            if var_name in vars_with_ebs
        }

cost_model_from_sagemaker_realtime_endpoint

cost_model_from_sagemaker_realtime_endpoint(endpoint_name, region=None)

Automatically infer an LLMeter CostModel from a deployed SageMaker real-time endpoint

This method builds a basic cost estimating model for SageMaker real-time inference endpoints including compute and EBS storage costs, but excluding data transfer costs. Standard on-demand pricing is used, without accounting for private pricing, tiers, savings plans, or etc.

NOTE: You'll need IAM permissions to pricing:GetProducts, sagemaker:DescribeEndpoint, and sagemaker:DescribeEndpointConfig to use this method.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	Name of the deployed SageMaker endpoint	required
`region`	`str \| None`	AWS region where the endpoint is running (default: current region)	`None`

Returns:

Name	Type	Description
`cost_model`	`CostModel`	A `CostModel` instance with inferred cost dimensions capturing instance compute and EBS storage. If multiple "variants" are deployed on the endpoint, separate dimensions will be created for each one with variant name as a prefix.

Source code in llmeter/callbacks/cost/providers/sagemaker.py

def cost_model_from_sagemaker_realtime_endpoint(
    endpoint_name: str, region: str | None = None
) -> CostModel:
    """Automatically infer an LLMeter CostModel from a deployed SageMaker real-time endpoint

    This method builds a basic cost estimating model for SageMaker real-time inference endpoints
    including compute and EBS storage costs, but excluding data transfer costs. Standard on-demand
    pricing is used, without accounting for private pricing, tiers, savings plans, or etc.

    NOTE: You'll need IAM permissions to `pricing:GetProducts`, `sagemaker:DescribeEndpoint`, and
    `sagemaker:DescribeEndpointConfig` to use this method.

    Args:
        endpoint_name: Name of the deployed SageMaker endpoint
        region: AWS region where the endpoint is running (default: current region)

    Returns:
        cost_model: A `CostModel` instance with inferred cost dimensions capturing instance compute
            and EBS storage. If multiple "variants" are deployed on the endpoint, separate
            dimensions will be created for each one with variant name as a prefix.
    """
    return CostModel(
        run_dims={
            **SageMakerRTEndpointCompute.from_endpoint(endpoint_name, region=region),
            **SageMakerRTEndpointStorage.from_endpoint(endpoint_name, region=region),
        }
    )

sagemaker

sagemaker

SageMakerRTEndpointCompute dataclass

fetch_sm_hosting_od_price staticmethod

from_endpoint classmethod

SageMakerRTEndpointStorage dataclass

fetch_sm_hosting_ebs_price staticmethod

from_endpoint classmethod

cost_model_from_sagemaker_realtime_endpoint

SageMakerRTEndpointCompute `dataclass`

fetch_sm_hosting_od_price `staticmethod`

from_endpoint `classmethod`

SageMakerRTEndpointStorage `dataclass`

fetch_sm_hosting_ebs_price `staticmethod`

from_endpoint `classmethod`