s3torchconnector.s3reader.constructor
Attributes
Classes
Constructor for DCPOptimizedS3Reader instances with range metadata injection. |
|
Constructor for creating |
Module Contents
- class s3torchconnector.s3reader.constructor.DCPOptimizedConstructor(max_gap_size: int | float = DEFAULT_MAX_GAP_SIZE)[source]
Constructor for DCPOptimizedS3Reader instances with range metadata injection.
Created from S3ReaderConstructor, and used by S3StorageReader for PyTorch DCP. Requires range metadata from DCP load plans to function properly.
- Usage Flow:
# User Setup reader_constructor = S3ReaderConstructor.dcp_optimized() -> DCPOptimizedConstructor S3StorageReader(…, reader_constructor) -> stores in S3FileSystem
# During DCP.load() S3StorageReader.prepare_local_plan(plan)
-> set_item_ranges_by_file(plan.items, storage_data, path) -> builds _item_ranges_by_file: {s3_uri: [ItemRange, …]}
# During read_data(), per file S3FileSystem.create_stream(path, “rb”)
- -> S3Client.get_object(bucket, key, reader_constructor)
- -> __call__(bucket, key, get_object_info, get_stream)
-> .metadata: SequentialS3Reader (no ranges available for .metadata file) -> .distcp: DCPOptimizedS3Reader(item_ranges) ranges from _item_ranges_by_file
- set_item_ranges_by_file(plan_items: List[ReadItem], storage_data: Dict[MetadataIndex, _StorageInfo], base_path: str | os.PathLike) None[source]
Extract and store item ranges from DCP load plan. Called by S3StorageReader.prepare_local_plan() to inject range metadata.
Note: This replaces any previously stored ranges (intentional for multi-call scenarios).
- class s3torchconnector.s3reader.constructor.S3ReaderConstructor[source]
Constructor for creating
partial(S3Reader)instances.Creates partial
S3Readerinstances that will be completed byS3Clientwith the remaining required parameters (e.g.bucket,key,get_object_info,get_stream).The constructor provides factory methods for different reader types:
sequential(): Creates a constructor for sequential readers that buffer the entire object. Best for full reads and repeated access.range_based(): Creates a constructor for range-based readers that fetch specific byte ranges. Suitable for sparse partial reads for large objects.
- static sequential() s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol[source]
Creates a constructor for sequential (generic) readers.
This reader is the generic reader that supports all access patterns.
- Returns:
Partial constructor for SequentialS3Reader
- Return type:
Example:
reader_constructor = S3ReaderConstructor.sequential()
- static range_based(buffer_size: int | None = None) s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol[source]
Creates a constructor for range-based readers
- Parameters:
buffer_size – Internal buffer size in bytes. If None, uses default 8MB. Set to 0 to disable buffering.
- Returns:
Partial constructor for RangedS3Reader
- Return type:
Range-based reader performs byte-range requests for each read/readinto call to read specific portions of S3 objects without downloading the entire file.
Buffer size affects read performance:
Small reads (<
buffer_size): Loadsbuffer_sizebytes to buffer to reduce S3 API calls for small, sequential readsLarge reads (≥
buffer_size): bypass the buffer for direct transfer from S3Forward overlap reads: Reuses buffered data when reading ranges that extend beyond current buffer, and processes remaining
data according to size with logic above.
Configuration Guide:
Use larger buffer sizes for workloads with many small, sequential reads of nearby bytes
Use smaller buffer sizes or disable buffering for sparse partial reads
Buffer can be disabled by setting
buffer_sizeto 0If
buffer_sizeis None, uses default 8MB buffer
Examples:
# Range-based reader with default 8MB buffer reader_constructor = S3ReaderConstructor.range_based() # Range-based reader with custom buffer size reader_constructor = S3ReaderConstructor.range_based(buffer_size=16*1024*1024) # Range-based reader with buffering disabled reader_constructor = S3ReaderConstructor.range_based(buffer_size=0)
- static dcp_optimized(max_gap_size: int | float = DEFAULT_MAX_GAP_SIZE) s3torchconnector.s3reader.protocol.DCPS3ReaderConstructorProtocol[source]
Creates a constructor for DCP-optimized readers for faster checkpoint loading.
The DCP-optimized reader provides performance improvements for DCP reading through:
Selective data fetching with range coalescing to only fetch required byte ranges
Per-item buffer management to reduce buffer allocation costs
Eliminating buffer copy by storing S3 chunks as memoryview references
- Parameters:
max_gap_size –
Maximum gap size in bytes between ranges to coalesce into the same S3 read stream. Most users should use the default value.
Default: 32MB (
32 * 1024 * 1024)Use
float("inf")to coalesce all ranges regardless of gapsUse 0 to disable coalescing, which creates a new range-based stream for each gap
- Returns:
Constructor that creates DCPOptimizedS3Reader when ranges are available, falling back to SequentialS3Reader otherwise.
- Return type:
DCPOptimizedConstructorProtocol
- Requirements:
Should be used with S3StorageReader, in which
prepare_local_plan()automatically handles:Load ordering: Sorts items by storage offset for sequential access
Range injection: Provides byte ranges from DCP load plan to the reader
Advanced users implementing custom readers must include these optimizations in their
prepare_local_plan()/read_data()implementation to use the DCP-optimized reader.
Example:
reader_constructor = S3ReaderConstructor.dcp_optimized() storage_reader = S3StorageReader(region, path, reader_constructor=reader_constructor) DCP.load(state_dict, storage_reader=storage_reader)
- static default() s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol[source]
Creates the default generic reader constructor.
This creates a sequential (generic) reader that supports all access patterns.
- Returns:
Partial constructor for SequentialS3Reader
- Return type:
- static get_reader_type_string(constructor: s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol | None) str[source]
Returns the reader type string for the given constructor.