Skip to content

Capacity Reservations and Cabacity Blocks

Note

Capacity Reservation can be enabled or disabled using /configuration/FeatureFlags/EnableCapacityReservation configuration flag.

Capacity Reservations in AWS allow you to reserve a specific number of instances of a particular type in a specific Availability Zone, ensuring that the capacity you need is available when you launch your instances. By reserving capacity in advance, you can avoid InsufficientInstanceCapacity errors that occur when AWS doesn’t have enough available instances of the desired type at launch time.

SOCA handles Capacity Reservations differently depending on the workload type.

VDI Workloads (Capacity Probing Only)

For Virtual Desktop Infrastructure (VDI), SOCA performs capacity probing only.

When a user launches a virtual desktop:

  1. SOCA checks whether EC2 capacity is available in the selected subnet.

  2. If capacity is available, SOCA creates a temporary Capacity Reservation to validate availability.

  3. The reservation is immediately canceled once capacity is confirmed.

  4. The virtual desktop is then provisioned.

No Capacity Reservation remains active after the desktop is launched.

VDI Capacity Probing Flow

graph TD
    A[User requests Virtual Desktop]
    B[Check EC2 capacity in subnet]
    C[Create temporary Capacity Reservation]
    D[Capacity confirmed]
    E[Cancel Capacity Reservation]
    F[Provision Virtual Desktop]
    G[Request fails]

    A --> B
    B -->|Capacity available| C
    C --> D
    D --> E
    E --> F
    B -->|No capacity| G

HPC Workloads (Actual Capacity Reservation)

Warning

SOCA does NOT support Capacity Reservations when jobs are submitted with multiple subnet IDs and/or instance types. In this scenario, SOCA attempts to provision the full capacity within a single subnet. If no reservation succeeds, SOCA proceeds will continue to host provisioning without enforcing the reservation, as EC2 Fleet may still be able to fulfill the capacity by distributing instances across multiple subnets. Enable /configuration/FeatureFlags/EnforceStrictCapacityReservation is you want to block the host provisioning unless SOCA can confirm capacity is fully available (e.g: if 100% of instance_type1 is available in subnet1)

Capacity Reservations can only be enforced when a job is submitted with a single subnet ID

The provisioning workflow is:

  1. SOCA checks EC2 capacity availability in the specified subnet(s).
  2. If capacity is available, SOCA creates a Capacity Reservation and retrieves a Reservation ID.
  3. In case of job submitted with multiple subnet IDs, SOCA will rewrite the subnet_id attributes to only include the subnet id where the capacity is available
  4. The Reservation ID is mapped to the EC2 Launch Template.

EC2 Fleet provisions the required instances using the reservation.

After successful provisioning, the Capacity Reservation is canceled.

HPC Capacity Reservation Flow

graph TD
    A[User submits HPC job]
    C[Check EC2 capacity in subnet]
    D[Create Capacity Reservation]
    E[Attach Reservation ID to Launch Template]
    F[EC2 Fleet provisions instances]
    G[Instances running]
    H[Cancel Capacity Reservation]
    I[Job stays in queue]

    A --> C
    C -->|Capacity available| D
    D --> E
    E --> F
    F --> G
    G --> H
    C -->|No capacity| I

Existing Capacity Reservations or Capacity Blocks for ML

You can reference an existing Capacity Reservation / Capacity Blocks for ML created outside of SOCA while submitting your HPC job via -l capacity_reservation_id (Learn more).

Capacity Reservation Cleanup and Cost Control

AWS charges for Capacity Reservations even when no EC2 instances are running. SOCA includes multiple safeguards to minimize unnecessary costs by ensuring reservations are canceled as quickly as possible.

Automatic EndDate

Each Capacity Reservation is created with an EndDate to enforce automatic expiration:

  • VDI capacity probing: now + 2 minutes
  • HPC capacity provisioning: now + 5 minutes

For additional details, see the AWS documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cr-concepts.html#cr-end-date

Immediate Cancellation After Probing

For VDI capacity probing, SOCA explicitly attempts to cancel the Capacity Reservation immediately after the capacity check completes, without waiting for the EndDate.

Automated Cleanup with ODCRCleaner Lambda

SOCA includes a Lambda function (ODCRCleaner) that continuously cleans up idle or orphaned Capacity Reservations associated with a SOCA cluster.

The Lambda function is triggered:

  • On a scheduled interval (every 5 minutes)
  • On CloudFormation stack events

This guarantees that unused Capacity Reservations are removed promptly.

Capacity Reservation Cleanup Flow

graph TD
    A[Capacity Reservation created]
    B[EndDate applied]
    C[SOCA attempts early cancellation]
    D[ODCRCleaner Lambda triggered]
    E[Identify idle/orphaned reservations]
    F[Cancel reservations]
    G[EndDate reached]
    H[Capacity Reservation is cancelled]

    A --> B
    B --> C
    C -->|Cancellation fails for any reason| D
    D --> E
    E -->|Cancellation fails| G
    E -->|Success| H
    G --> H