Capacity Reservations and Cabacity Blocks
Note
Capacity Reservation can be enabled or disabled using /configuration/FeatureFlags/EnableCapacityReservation configuration flag.
Capacity Reservations in AWS allow you to reserve a specific number of instances of a particular type in a specific Availability Zone, ensuring that the capacity you need is available when you launch your instances. By reserving capacity in advance, you can avoid InsufficientInstanceCapacity errors that occur when AWS doesn’t have enough available instances of the desired type at launch time.
SOCA handles Capacity Reservations differently depending on the workload type.
VDI Workloads (Capacity Probing Only)¶
For Virtual Desktop Infrastructure (VDI), SOCA performs capacity probing only.
When a user launches a virtual desktop:
-
SOCA checks whether EC2 capacity is available in the selected subnet.
-
If capacity is available, SOCA creates a temporary Capacity Reservation to validate availability.
-
The reservation is immediately canceled once capacity is confirmed.
-
The virtual desktop is then provisioned.
No Capacity Reservation remains active after the desktop is launched.
VDI Capacity Probing Flow
graph TD
A[User requests Virtual Desktop]
B[Check EC2 capacity in subnet]
C[Create temporary Capacity Reservation]
D[Capacity confirmed]
E[Cancel Capacity Reservation]
F[Provision Virtual Desktop]
G[Request fails]
A --> B
B -->|Capacity available| C
C --> D
D --> E
E --> F
B -->|No capacity| G
HPC Workloads (Actual Capacity Reservation)¶
Warning
SOCA does NOT support Capacity Reservations when jobs are submitted with multiple subnet IDs and/or instance types. In this scenario, SOCA attempts to provision the full capacity within a single subnet. If no reservation succeeds, SOCA proceeds will continue to host provisioning without enforcing the reservation, as EC2 Fleet may still be able to fulfill the capacity by distributing instances across multiple subnets. Enable /configuration/FeatureFlags/EnforceStrictCapacityReservation is you want to block the host provisioning unless SOCA can confirm capacity is fully available (e.g: if 100% of instance_type1 is available in subnet1)
Capacity Reservations can only be enforced when a job is submitted with a single subnet ID
The provisioning workflow is:
- SOCA checks EC2 capacity availability in the specified subnet(s).
- If capacity is available, SOCA creates a Capacity Reservation and retrieves a Reservation ID.
- In case of job submitted with multiple subnet IDs, SOCA will rewrite the
subnet_idattributes to only include the subnet id where the capacity is available - The Reservation ID is mapped to the EC2 Launch Template.
EC2 Fleet provisions the required instances using the reservation.
After successful provisioning, the Capacity Reservation is canceled.
HPC Capacity Reservation Flow
graph TD
A[User submits HPC job]
C[Check EC2 capacity in subnet]
D[Create Capacity Reservation]
E[Attach Reservation ID to Launch Template]
F[EC2 Fleet provisions instances]
G[Instances running]
H[Cancel Capacity Reservation]
I[Job stays in queue]
A --> C
C -->|Capacity available| D
D --> E
E --> F
F --> G
G --> H
C -->|No capacity| I
Existing Capacity Reservations or Capacity Blocks for ML¶
You can reference an existing Capacity Reservation / Capacity Blocks for ML created outside of SOCA while submitting your HPC job via -l capacity_reservation_id (Learn more).
Capacity Reservation Cleanup and Cost Control¶
AWS charges for Capacity Reservations even when no EC2 instances are running. SOCA includes multiple safeguards to minimize unnecessary costs by ensuring reservations are canceled as quickly as possible.
Automatic EndDate¶
Each Capacity Reservation is created with an EndDate to enforce automatic expiration:
- VDI capacity probing:
now + 2 minutes - HPC capacity provisioning:
now + 5 minutes
For additional details, see the AWS documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cr-concepts.html#cr-end-date
Immediate Cancellation After Probing¶
For VDI capacity probing, SOCA explicitly attempts to cancel the Capacity Reservation immediately after the capacity check completes, without waiting for the EndDate.
Automated Cleanup with ODCRCleaner Lambda¶
SOCA includes a Lambda function (ODCRCleaner) that continuously cleans up idle or orphaned Capacity Reservations associated with a SOCA cluster.
The Lambda function is triggered:
- On a scheduled interval (every 5 minutes)
- On CloudFormation stack events
This guarantees that unused Capacity Reservations are removed promptly.
Capacity Reservation Cleanup Flow
graph TD
A[Capacity Reservation created]
B[EndDate applied]
C[SOCA attempts early cancellation]
D[ODCRCleaner Lambda triggered]
E[Identify idle/orphaned reservations]
F[Cancel reservations]
G[EndDate reached]
H[Capacity Reservation is cancelled]
A --> B
B --> C
C -->|Cancellation fails for any reason| D
D --> E
E -->|Cancellation fails| G
E -->|Success| H
G --> H