s3torchconnector.s3reader.constructor

Attributes

log

Classes

DCPOptimizedConstructor

Constructor for DCPOptimizedS3Reader instances with range metadata injection.

S3ReaderConstructor

Constructor for creating partial(S3Reader) instances.

Module Contents

s3torchconnector.s3reader.constructor.log[source]
class s3torchconnector.s3reader.constructor.DCPOptimizedConstructor(max_gap_size: int | float = DEFAULT_MAX_GAP_SIZE)[source]

Constructor for DCPOptimizedS3Reader instances with range metadata injection.

Created from S3ReaderConstructor, and used by S3StorageReader for PyTorch DCP. Requires range metadata from DCP load plans to function properly.

Usage Flow:

# User Setup reader_constructor = S3ReaderConstructor.dcp_optimized() -> DCPOptimizedConstructor S3StorageReader(…, reader_constructor) -> stores in S3FileSystem

# During DCP.load() S3StorageReader.prepare_local_plan(plan)

-> set_item_ranges_by_file(plan.items, storage_data, path) -> builds _item_ranges_by_file: {s3_uri: [ItemRange, …]}

# During read_data(), per file S3FileSystem.create_stream(path, “rb”)

-> S3Client.get_object(bucket, key, reader_constructor)
-> __call__(bucket, key, get_object_info, get_stream)

-> .metadata: SequentialS3Reader (no ranges available for .metadata file) -> .distcp: DCPOptimizedS3Reader(item_ranges) ranges from _item_ranges_by_file

set_item_ranges_by_file(plan_items: List[ReadItem], storage_data: Dict[MetadataIndex, _StorageInfo], base_path: str | os.PathLike) None[source]

Extract and store item ranges from DCP load plan. Called by S3StorageReader.prepare_local_plan() to inject range metadata.

Note: This replaces any previously stored ranges (intentional for multi-call scenarios).

class s3torchconnector.s3reader.constructor.S3ReaderConstructor[source]

Constructor for creating partial(S3Reader) instances.

Creates partial S3Reader instances that will be completed by S3Client with the remaining required parameters (e.g. bucket, key, get_object_info, get_stream).

The constructor provides factory methods for different reader types:

  • sequential(): Creates a constructor for sequential readers that buffer the entire object. Best for full reads and repeated access.

  • range_based(): Creates a constructor for range-based readers that fetch specific byte ranges. Suitable for sparse partial reads for large objects.

static sequential() s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol[source]

Creates a constructor for sequential (generic) readers.

This reader is the generic reader that supports all access patterns.

Returns:

Partial constructor for SequentialS3Reader

Return type:

S3ReaderConstructorProtocol

Example:

reader_constructor = S3ReaderConstructor.sequential()
static range_based(buffer_size: int | None = None) s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol[source]

Creates a constructor for range-based readers

Parameters:

buffer_size – Internal buffer size in bytes. If None, uses default 8MB. Set to 0 to disable buffering.

Returns:

Partial constructor for RangedS3Reader

Return type:

S3ReaderConstructorProtocol

Range-based reader performs byte-range requests for each read/readinto call to read specific portions of S3 objects without downloading the entire file.

Buffer size affects read performance:

  • Small reads (< buffer_size): Loads buffer_size bytes to buffer to reduce S3 API calls for small, sequential reads

  • Large reads (≥ buffer_size): bypass the buffer for direct transfer from S3

  • Forward overlap reads: Reuses buffered data when reading ranges that extend beyond current buffer, and processes remaining

data according to size with logic above.

Configuration Guide:

  • Use larger buffer sizes for workloads with many small, sequential reads of nearby bytes

  • Use smaller buffer sizes or disable buffering for sparse partial reads

  • Buffer can be disabled by setting buffer_size to 0

  • If buffer_size is None, uses default 8MB buffer

Examples:

# Range-based reader with default 8MB buffer
reader_constructor = S3ReaderConstructor.range_based()

# Range-based reader with custom buffer size
reader_constructor = S3ReaderConstructor.range_based(buffer_size=16*1024*1024)

# Range-based reader with buffering disabled
reader_constructor = S3ReaderConstructor.range_based(buffer_size=0)
static dcp_optimized(max_gap_size: int | float = DEFAULT_MAX_GAP_SIZE) s3torchconnector.s3reader.protocol.DCPS3ReaderConstructorProtocol[source]

Creates a constructor for DCP-optimized readers for faster checkpoint loading.

The DCP-optimized reader provides performance improvements for DCP reading through:

  • Selective data fetching with range coalescing to only fetch required byte ranges

  • Per-item buffer management to reduce buffer allocation costs

  • Eliminating buffer copy by storing S3 chunks as memoryview references

Parameters:

max_gap_size

Maximum gap size in bytes between ranges to coalesce into the same S3 read stream. Most users should use the default value.

  • Default: 32MB (32 * 1024 * 1024)

  • Use float("inf") to coalesce all ranges regardless of gaps

  • Use 0 to disable coalescing, which creates a new range-based stream for each gap

Returns:

Constructor that creates DCPOptimizedS3Reader when ranges are available, falling back to SequentialS3Reader otherwise.

Return type:

DCPOptimizedConstructorProtocol

Requirements:

Should be used with S3StorageReader, in which prepare_local_plan() automatically handles:

  • Load ordering: Sorts items by storage offset for sequential access

  • Range injection: Provides byte ranges from DCP load plan to the reader

Advanced users implementing custom readers must include these optimizations in their prepare_local_plan()/read_data() implementation to use the DCP-optimized reader.

Example:

reader_constructor = S3ReaderConstructor.dcp_optimized()
storage_reader = S3StorageReader(region, path, reader_constructor=reader_constructor)
DCP.load(state_dict, storage_reader=storage_reader)
static default() s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol[source]

Creates the default generic reader constructor.

This creates a sequential (generic) reader that supports all access patterns.

Returns:

Partial constructor for SequentialS3Reader

Return type:

S3ReaderConstructorProtocol

static get_reader_type_string(constructor: s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol | None) str[source]

Returns the reader type string for the given constructor.