s3torchconnector.s3reader
Submodules
Classes
An abstract base class for read-only, file-like representation of a single object stored in S3. |
|
Constructor for creating |
|
Sequential S3 reader implementation |
|
Range-based S3 reader implementation with adaptive buffering. |
Package Contents
- class s3torchconnector.s3reader.S3Reader[source]
Bases:
abc.ABC
,io.BufferedIOBase
An abstract base class for read-only, file-like representation of a single object stored in S3.
This class defines the interface for S3 readers. Concrete implementations (SequentialS3Reader or RangedS3Reader extend this class. S3ReaderConstructor creates partial functions of these implementations, which are then completed by S3Client with the remaining required parameters.
- property bucket: str
- Abstractmethod:
- property key: str
- Abstractmethod:
- abstract read(size: int | None = None) bytes [source]
Read and return up to n bytes.
If the argument is omitted, None, or negative, reads and returns all data until EOF.
If the argument is positive, and the underlying raw stream is not ‘interactive’, multiple raw reads may be issued to satisfy the byte count (unless EOF is reached first). But for interactive raw streams (as well as sockets and pipes), at most one raw read will be issued, and a short result does not imply that EOF is imminent.
Returns an empty bytes object on EOF.
Returns None if the underlying raw stream was open in non-blocking mode and no data is available at the moment.
- abstract seek(offset: int, whence: int = SEEK_SET, /) int [source]
Change stream position.
Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:
0 – start of stream (the default); offset should be zero or positive
1 – current stream position; offset may be negative
2 – end of stream; offset is usually negative
Return the new absolute position.
- class s3torchconnector.s3reader.S3ReaderConstructor[source]
Constructor for creating
partial(S3Reader)
instances.Creates partial
S3Reader
instances that will be completed byS3Client
with the remaining required parameters (e.g.bucket
,key
,get_object_info
,get_stream
).The constructor provides factory methods for different reader types:
sequential()
: Creates a constructor for sequential readers that buffer the entire object. Best for full reads and repeated access.range_based()
: Creates a constructor for range-based readers that fetch specific byte ranges. Suitable for sparse partial reads for large objects.
- static sequential() s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol [source]
Creates a constructor for sequential readers
- Returns:
Partial constructor for SequentialS3Reader
- Return type:
Example:
reader_constructor = S3ReaderConstructor.sequential()
- static range_based(buffer_size: int | None = None) s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol [source]
Creates a constructor for range-based readers
- Parameters:
buffer_size – Internal buffer size in bytes. If None, uses default 8MB. Set to 0 to disable buffering.
- Returns:
Partial constructor for RangedS3Reader
- Return type:
Range-based reader performs byte-range requests to read specific portions of S3 objects without downloading the entire file.
Buffer size affects read performance:
Small reads (<
buffer_size
): Loadsbuffer_size
bytes to buffer to reduce S3 API calls for small, sequential readsLarge reads (≥
buffer_size
): bypass the buffer for direct transfer from S3Forward overlap reads: Reuses buffered data when reading ranges that extend beyond current buffer, and processes remaining
data according to size with logic above.
Configuration Guide:
Use larger buffer sizes for workloads with many small, sequential reads of nearby bytes
Use smaller buffer sizes or disable buffering for sparse partial reads
Buffer can be disabled by setting
buffer_size
to 0If
buffer_size
is None, uses default 8MB buffer
Examples:
# Range-based reader with default 8MB buffer reader_constructor = S3ReaderConstructor.range_based() # Range-based reader with custom buffer size reader_constructor = S3ReaderConstructor.range_based(buffer_size=16*1024*1024) # Range-based reader with buffering disabled reader_constructor = S3ReaderConstructor.range_based(buffer_size=0)
- static default() s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol [source]
Creates default reader constructor (sequential)
- Returns:
Partial constructor for SequentialS3Reader
- Return type:
- static get_reader_type_string(constructor: s3torchconnector.s3reader.protocol.S3ReaderConstructorProtocol | None) str [source]
Returns the reader type string for the given constructor.
- class s3torchconnector.s3reader.SequentialS3Reader(bucket: str, key: str, get_object_info: Callable[[], s3torchconnectorclient._mountpoint_s3_client.ObjectInfo | s3torchconnectorclient._mountpoint_s3_client.HeadObjectResult], get_stream: Callable[[], s3torchconnectorclient._mountpoint_s3_client.GetObjectStream])[source]
Bases:
s3torchconnector.s3reader.s3reader.S3Reader
Sequential S3 reader implementation
Maintains an internal buffer for efficient sequential reads and repeated access. Optimal for most use cases, including full object reads.
- property bucket: str
- property key: str
- prefetch() None [source]
Start fetching data from S3.
- Raises:
S3Exception – An error occurred accessing S3.
- readinto(buf) int [source]
Read up to len(buf) bytes into a pre-allocated, writable bytes-like object buf. Return the number of bytes read. If no bytes are available, zero is returned.
- Parameters:
buf – writable bytes-like object
- Returns:
numer of bytes read or zero, if no bytes available
- Return type:
int
- read(size: int | None = None) bytes [source]
Read up to size bytes from the object and return them.
If size is zero or positive, read that many bytes from S3, or until the end of the object. If size is None or negative, read the entire file.
- Parameters:
size (int | None) – how many bytes to read.
- Returns:
Bytes read from S3 Object
- Return type:
bytes
- Raises:
S3Exception – An error occurred accessing S3.
- seek(offset: int, whence: int = SEEK_SET, /) int [source]
Change the stream position to the given byte offset, interpreted relative to whence.
When seeking beyond the end of the file, always stay at EOF. Seeking before the start of the file results in a ValueError.
- Parameters:
offset (int) – How many bytes to seek relative to whence.
whence (int) – One of SEEK_SET, SEEK_CUR, and SEEK_END. Default: SEEK_SET
- Returns:
Current position of the stream
- Return type:
int
- Raises:
S3Exception – An error occurred accessing S3.
- class s3torchconnector.s3reader.RangedS3Reader(bucket: str, key: str, get_object_info: Callable[[], s3torchconnectorclient._mountpoint_s3_client.ObjectInfo | s3torchconnectorclient._mountpoint_s3_client.HeadObjectResult], get_stream: Callable[[int | None, int | None], s3torchconnectorclient._mountpoint_s3_client.GetObjectStream], buffer_size: int | None = None)[source]
Bases:
s3torchconnector.s3reader.s3reader.S3Reader
Range-based S3 reader implementation with adaptive buffering.
Performs byte-range requests to read specific portions of S3 objects without downloading the entire file. Includes optional adaptive buffer to reduce S3 API calls for small, sequential reads while bypassing buffering for large reads. Optimal for sparse partial reads of large objects.
Buffering behavior:
Small reads (<
buffer_size
): Loadsbuffer_size
bytes to buffer, copies to userLarge reads (>=
buffer_size
): Direct S3 access, bypass bufferForward overlapping reads: Reuses existing buffer data if possible when read range extends beyond current buffer
Buffer can be disabled by setting
buffer_size
to 0If
buffer_size
is None, uses default 8MB buffer
- Parameters:
bucket – S3 bucket name
key – S3 object key
get_object_info – Callable that returns object metadata
get_stream – Callable that returns stream for byte range requests
buffer_size – Internal buffer size in bytes, defaults to 8MB
- property bucket: str
- property key: str
- readinto(buf) int [source]
Read up to len(buf) bytes into a pre-allocated, writable bytes-like object buf. Return the number of bytes read. If no bytes are available, zero is returned.
- Parameters:
buf – writable bytes-like object
- Returns:
numer of bytes read or zero, if no bytes available
- Return type:
int
- read(size: int | None = None) bytes [source]
Read up to size bytes from the current position.
If size is zero or positive, read that many bytes from S3, or until the end of the object. If size is None or negative, read until the end of the object.
- Parameters:
size (int | None) – how many bytes to read.
- Returns:
Bytes read from specified range.
- Return type:
bytes
- Raises:
S3Exception – An error occurred accessing S3.
- seek(offset: int, whence: int = SEEK_SET, /) int [source]
Change the stream position to the given byte offset, interpreted relative to whence.
When seeking beyond the end of the file, always stay at EOF. Seeking before the start of the file results in a ValueError.
- Parameters:
offset (int) – How many bytes to seek relative to whence.
whence (int) – One of SEEK_SET, SEEK_CUR, and SEEK_END. Default: SEEK_SET
- Returns:
Current position of the stream
- Return type:
int
- Raises:
S3Exception – An error occurred accessing S3.