Skip to content

sagemaker

sagemaker

Utilities for automating cost modelling on Amazon SageMaker endpoints

SageMakerRTEndpointCompute dataclass

SageMakerRTEndpointCompute(instance_type, instance_count=1, price_per_hour=None, region=None)

Bases: RunCostDimensionBase

Run cost dimension to estimate Amazon SageMaker real-time endpoint compute charges

NOTE: To auto-discover price_per_hour from instance type, you'll need IAM permissions to call the pricing:GetProducts API. For more information, see: https://docs.aws.amazon.com/service-authorization/latest/reference/list_awspricelist.html

Calculated rates do not currently include EBS storage volume costs. See SageMakerRTEndpointStorage for estimating this.

See .fetch_sm_hosting_od_price() for more details on how price is looked up when not explicitly provided. This lookup is provided on a best-effort basis, and may not accurately reflect all possible scenarios.

Use .from_endpoint() to automatically discover compute cost dimensions from a deployed SageMaker real-time inference endpoint.

Parameters:

Name Type Description Default
instance_type str

Amazon SageMaker instance type e.g. 'ml.g5.4xlarge'

required
instance_count float

Number of instances running (default: 1)

1
price_per_hour float | None

Price per hour per instance (default: attempt to fetch from pricing API)

None
region str | None

AWS region where the endpoint is running (default: current region)

None

fetch_sm_hosting_od_price staticmethod

fetch_sm_hosting_od_price(instance_type, region)

Look up USD hourly rates for on-demand SM hosting instances from the AWS Price List API

This function assumes:

  1. You're using standard "on-demand" pricing - no savings plans or private pricing
  2. No free tier or volume discounts are applicable to this usage
  3. Your pricing is provided in USD

Parameters:

Name Type Description Default
instance_type str

Amazon SageMaker instance type, e.g. 'ml.g5.4xlarge'

required
region str

AWS region where the endpoint is running, e.g. 'us-east-1'

required

Returns:

Name Type Description
price_per_hour float

The standard, on-demand hourly price for the given instance type in USD

Source code in llmeter/callbacks/cost/providers/sagemaker.py
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
@staticmethod
def fetch_sm_hosting_od_price(instance_type: str, region: str) -> float:
    """Look up USD hourly rates for on-demand SM hosting instances from the AWS Price List API

    This function assumes:

    1. You're using standard "on-demand" pricing - no savings plans or private pricing
    2. No free tier or volume discounts are applicable to this usage
    3. Your pricing is provided in USD

    Args:
        instance_type: Amazon SageMaker instance type, e.g. 'ml.g5.4xlarge'
        region: AWS region where the endpoint is running, e.g. 'us-east-1'

    Returns:
        price_per_hour: The standard, on-demand hourly price for the given instance type in USD
    """
    unit_price, _ = _fetch_single_product_ondemand_unit_price(
        service_code="AmazonSageMaker",
        attribute_values={"component": "Hosting", "instanceName": instance_type},
        region=region,
        dim_unit="Hrs",
    )
    return unit_price

from_endpoint classmethod

from_endpoint(endpoint_name, region=None)

Configure SageMakerRTEndpointCompute dimension(s) from an existing SageMaker endpoint

NOTE: You'll need IAM permissions to sagemaker:DescribeEndpoint and sagemaker:DescribeEndpointConfig to use this method.

This function returns a dictionary rather than a single SageMakerRTEndpointCompute, because different "variants" deployed behind an endpoint may be backed by clusters of different instance types, and therefore need separate dimensions.

Instance counts will be retrieved at the point in time this method is called, so watch out if you have auto-scaling enabled on your endpoint.

Parameters:

Name Type Description Default
endpoint_name str

Name of the SageMaker endpoint

required
region str | None

AWS region where the endpoint is running (default: current region)

None

Returns:

Name Type Description
run_dims dict[str, SageMakerRTEndpointCompute]

A dictionary containing one or more dimensions, to pass to your CostModel(run_dims=...). If the endpoint has only one "Variant", the returned dict will have a single key (cost dimension name) SageMakerRTEndpointCompute. Otherwise, keys will be generated as {variant_name}SageMakerRTEndpointCompute

Source code in llmeter/callbacks/cost/providers/sagemaker.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
@classmethod
def from_endpoint(
    cls, endpoint_name: str, region: str | None = None
) -> dict[str, SageMakerRTEndpointCompute]:
    """Configure SageMakerRTEndpointCompute dimension(s) from an existing SageMaker endpoint

    NOTE: You'll need IAM permissions to `sagemaker:DescribeEndpoint` and
    `sagemaker:DescribeEndpointConfig` to use this method.

    This function returns a dictionary rather than a single `SageMakerRTEndpointCompute`,
    because different "variants" deployed behind an endpoint may be backed by clusters of
    different instance types, and therefore need separate dimensions.

    Instance counts will be retrieved at the point in time this method is called, so watch out
    if you have auto-scaling enabled on your endpoint.

    Args:
        endpoint_name: Name of the SageMaker endpoint
        region: AWS region where the endpoint is running (default: current region)

    Returns:
        run_dims: A dictionary containing one or more dimensions, to pass to your
            `CostModel(run_dims=...)`. If the endpoint has only one "Variant", the returned
            dict will have a single key (cost dimension name) `SageMakerRTEndpointCompute`.
            Otherwise, keys will be generated as `{variant_name}SageMakerRTEndpointCompute`
    """
    sagemaker = boto3.client("sagemaker", region_name=region)
    endpoint_desc = sagemaker.describe_endpoint(EndpointName=endpoint_name)
    ep_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_desc["ProductionVariants"]
    }
    endpoint_config = sagemaker.describe_endpoint_config(
        EndpointConfigName=endpoint_desc["EndpointConfigName"]
    )
    cfg_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_config["ProductionVariants"]
    }

    if len(ep_vars) == 1:
        variant = next(iter(ep_vars.values()))
        variant_config = cfg_vars[variant["VariantName"]]
        return {
            cls.__name__: cls(
                instance_type=variant_config["InstanceType"],
                instance_count=variant["CurrentInstanceCount"],
                region=sagemaker.meta.region_name,
            )
        }
    else:
        return {
            f"{var_name}_{cls.__name__}": cls(
                instance_type=cfg_vars[var_name]["InstanceType"],
                instance_count=ep_vars[var_name]["CurrentInstanceCount"],
                region=sagemaker.meta.region_name,
            )
            for var_name in ep_vars.keys()
        }

SageMakerRTEndpointStorage dataclass

SageMakerRTEndpointStorage(gbs_provisioned, price_per_gb_hour=None, region=None)

Bases: RunCostDimensionBase

Run cost dimension to estimate EBS charges for Amazon SageMaker real-time endpoints

NOTE: To auto-discover price_per_gb_hour, you'll need IAM permissions to call the pricing:GetProducts API. For more information, see: https://docs.aws.amazon.com/service-authorization/latest/reference/list_awspricelist.html

See .fetch_sm_hosting_ebs_price() for more details on how price is looked up when not explicitly provided. This lookup is provided on a best-effort basis, and may not accurately reflect all possible scenarios.

Parameters:

Name Type Description Default
gbs_provisioned float

Total size of provisioned EBS volume(s) for the endpoint in Gigabytes

required
price_per_gb_hour float | None

Price per hour per GB (default: attempt to fetch from pricing API)

None
region str | None

AWS region where the endpoint is running (default: current region)

None

fetch_sm_hosting_ebs_price staticmethod

fetch_sm_hosting_ebs_price(region)

Look up hourly USD rate for SageMaker hosting EBS storage from the AWS Price List API

The API actually lists this rate as monthly, so we take an assumption of 30days * 24hrs to convert.

Parameters:

Name Type Description Default
region str

AWS region where the endpoint is running, e.g. 'us-east-1'

required

Returns:

Name Type Description
price_per_gb_hour float

The standard, on-demand price per GB-hour in USD

Source code in llmeter/callbacks/cost/providers/sagemaker.py
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
@staticmethod
def fetch_sm_hosting_ebs_price(region: str) -> float:
    """Look up hourly USD rate for SageMaker hosting EBS storage from the AWS Price List API

    The API actually lists this rate as monthly, so we take an assumption of 30days * 24hrs to
    convert.

    Args:
        region: AWS region where the endpoint is running, e.g. 'us-east-1'

    Returns:
        price_per_gb_hour: The standard, on-demand price per GB-hour in USD
    """
    unit_price, _ = _fetch_single_product_ondemand_unit_price(
        service_code="AmazonSageMaker",
        attribute_values={"volumeType": "General Purpose-Hosting"},
        region=region,
        dim_unit="GB-Mo",
    )

    return unit_price / (30 * 24)  # Convert GB-mo to GB-hr

from_endpoint classmethod

from_endpoint(endpoint_name, region=None, merge_variants=False)

Configure SageMakerRTEndpointStorage dimension(s) from an existing SageMaker endpoint

NOTE: You'll need IAM permissions to sagemaker:DescribeEndpoint and sagemaker:DescribeEndpointConfig to use this method.

This function returns a dictionary rather than a single SageMakerRTEndpointStorage, because different "variants" deployed behind an endpoint may be reported separately, if more than one are present.

Instance counts will be retrieved at the point in time this method is called, so watch out if you have auto-scaling enabled on your endpoint.

Parameters:

Name Type Description Default
endpoint_name str

Name of the SageMaker endpoint

required
region str | None

AWS region where the endpoint is running (default: current region)

None
merge_variants bool

Set True to merge multiple "variants" into a single storage dimension (since the hourly rate is constant across instance types). By default, if multiple variants are configured these will be reported as separate cost dimensions.

False

Returns:

Name Type Description
run_dims dict[str, SageMakerRTEndpointStorage]

A dictionary containing zero or more dimensions, to pass to your CostModel(run_dims=...). If the endpoint has only one "Variant", the returned dict will have a single key (cost dimension name) SageMakerRTEndpointCompute. Otherwise, keys will be generated as {variant_name}SageMakerRTEndpointCompute. Any variants with 0 EBS storage configured will be omitted from the result.

Source code in llmeter/callbacks/cost/providers/sagemaker.py
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
@classmethod
def from_endpoint(
    cls,
    endpoint_name: str,
    region: str | None = None,
    merge_variants: bool = False,
) -> dict[str, SageMakerRTEndpointStorage]:
    """Configure SageMakerRTEndpointStorage dimension(s) from an existing SageMaker endpoint

    NOTE: You'll need IAM permissions to `sagemaker:DescribeEndpoint` and
    `sagemaker:DescribeEndpointConfig` to use this method.

    This function returns a dictionary rather than a single `SageMakerRTEndpointStorage`,
    because different "variants" deployed behind an endpoint may be reported separately, if
    more than one are present.

    Instance counts will be retrieved at the point in time this method is called, so watch out
    if you have auto-scaling enabled on your endpoint.

    Args:
        endpoint_name: Name of the SageMaker endpoint
        region: AWS region where the endpoint is running (default: current region)
        merge_variants: Set `True` to merge multiple "variants" into a single storage dimension
            (since the hourly rate is constant across instance types). By default, if multiple
            variants are configured these will be reported as separate cost dimensions.

    Returns:
        run_dims: A dictionary containing zero or more dimensions, to pass to your
            `CostModel(run_dims=...)`. If the endpoint has only one "Variant", the returned
            dict will have a single key (cost dimension name) `SageMakerRTEndpointCompute`.
            Otherwise, keys will be generated as `{variant_name}SageMakerRTEndpointCompute`.
            Any variants with 0 EBS storage configured will be omitted from the result.
    """
    sagemaker = boto3.client("sagemaker", region_name=region)
    region_final = sagemaker.meta.region_name
    endpoint_desc = sagemaker.describe_endpoint(EndpointName=endpoint_name)
    ep_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_desc["ProductionVariants"]
    }
    endpoint_config = sagemaker.describe_endpoint_config(
        EndpointConfigName=endpoint_desc["EndpointConfigName"]
    )
    cfg_vars: dict[str, dict] = {
        e["VariantName"]: e for e in endpoint_config["ProductionVariants"]
    }

    # variant.VolumeSizeInGB seems to be set automatically for EBS-enabled instance types, but
    # missing for non-EBS (storage-included) instance types.
    vars_with_ebs = [
        name for name, cfg in cfg_vars.items() if cfg.get("VolumeSizeInGB")
    ]

    if len(vars_with_ebs) == 0:
        return {}
    elif len(ep_vars) == 1:
        variant = next(iter(ep_vars.values()))
        variant_config = cfg_vars[variant["VariantName"]]
        return {
            cls.__name__: cls(
                gbs_provisioned=variant_config["VolumeSizeInGB"]
                * variant["CurrentInstanceCount"],
                region=region_final,
            )
        }
    elif merge_variants:
        total_gbs = sum(
            [
                cfg_vars[var_name].get("VolumeSizeInGB", 0)
                * ep_vars[var_name]["CurrentInstanceCount"]
                for var_name in ep_vars.keys()
            ]
        )
        return {cls.__name__: cls(gbs_provisioned=total_gbs, region=region_final)}
    else:
        return {
            f"{var_name}_{cls.__name__}": cls(
                gbs_provisioned=cfg_vars[var_name]["VolumeSizeInGB"]
                * ep_vars[var_name]["CurrentInstanceCount"],
                region=region_final,
            )
            for var_name in ep_vars.keys()
            if var_name in vars_with_ebs
        }

cost_model_from_sagemaker_realtime_endpoint

cost_model_from_sagemaker_realtime_endpoint(endpoint_name, region=None)

Automatically infer an LLMeter CostModel from a deployed SageMaker real-time endpoint

This method builds a basic cost estimating model for SageMaker real-time inference endpoints including compute and EBS storage costs, but excluding data transfer costs. Standard on-demand pricing is used, without accounting for private pricing, tiers, savings plans, or etc.

NOTE: You'll need IAM permissions to pricing:GetProducts, sagemaker:DescribeEndpoint, and sagemaker:DescribeEndpointConfig to use this method.

Parameters:

Name Type Description Default
endpoint_name str

Name of the deployed SageMaker endpoint

required
region str | None

AWS region where the endpoint is running (default: current region)

None

Returns:

Name Type Description
cost_model CostModel

A CostModel instance with inferred cost dimensions capturing instance compute and EBS storage. If multiple "variants" are deployed on the endpoint, separate dimensions will be created for each one with variant name as a prefix.

Source code in llmeter/callbacks/cost/providers/sagemaker.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
def cost_model_from_sagemaker_realtime_endpoint(
    endpoint_name: str, region: str | None = None
) -> CostModel:
    """Automatically infer an LLMeter CostModel from a deployed SageMaker real-time endpoint

    This method builds a basic cost estimating model for SageMaker real-time inference endpoints
    including compute and EBS storage costs, but excluding data transfer costs. Standard on-demand
    pricing is used, without accounting for private pricing, tiers, savings plans, or etc.

    NOTE: You'll need IAM permissions to `pricing:GetProducts`, `sagemaker:DescribeEndpoint`, and
    `sagemaker:DescribeEndpointConfig` to use this method.

    Args:
        endpoint_name: Name of the deployed SageMaker endpoint
        region: AWS region where the endpoint is running (default: current region)

    Returns:
        cost_model: A `CostModel` instance with inferred cost dimensions capturing instance compute
            and EBS storage. If multiple "variants" are deployed on the endpoint, separate
            dimensions will be created for each one with variant name as a prefix.
    """
    return CostModel(
        run_dims={
            **SageMakerRTEndpointCompute.from_endpoint(endpoint_name, region=region),
            **SageMakerRTEndpointStorage.from_endpoint(endpoint_name, region=region),
        }
    )