s3torchconnector
Submodules
Classes
A read-only, file like representation of a single object stored in S3. |
|
A write-only, file like representation of a single object stored in S3. |
|
An IterableStyle dataset created from S3 objects. |
|
A Map-Style dataset created from S3 objects. |
|
A checkpoint manager for S3. |
|
A dataclass exposing configurable parameters for the S3 client. |
Package Contents
- class s3torchconnector.S3Reader(bucket: str, key: str, get_object_info: Callable[[], s3torchconnectorclient._mountpoint_s3_client.ObjectInfo | s3torchconnectorclient._mountpoint_s3_client.HeadObjectResult], get_stream: Callable[[], s3torchconnectorclient._mountpoint_s3_client.GetObjectStream])[source]
Bases:
io.BufferedIOBase
A read-only, file like representation of a single object stored in S3.
- property bucket
- property key
- prefetch() None [source]
Start fetching data from S3.
- Raises:
S3Exception – An error occurred accessing S3.
- readinto(buf) int [source]
Read up to len(buf) bytes into a pre-allocated, writable bytes-like object buf. Return the number of bytes read. If no bytes are available, zero is returned.
- Parameters:
buf – writable bytes-like object
- Returns:
numer of bytes read or zero, if no bytes available
- Return type:
int
- read(size: int | None = None) bytes [source]
Read up to size bytes from the object and return them.
If size is zero or positive, read that many bytes from S3, or until the end of the object. If size is None or negative, read the entire file.
- Parameters:
size (int | None) – how many bytes to read.
- Returns:
Bytes read from S3 Object
- Return type:
bytes
- Raises:
S3Exception – An error occurred accessing S3.
- seek(offset: int, whence: int = SEEK_SET, /) int [source]
Change the stream position to the given byte offset, interpreted relative to whence.
When seeking beyond the end of the file, always stay at EOF. Seeking before the start of the file results in a ValueError.
- Parameters:
offset (int) – How many bytes to seek relative to whence.
whence (int) – One of SEEK_SET, SEEK_CUR, and SEEK_END. Default: SEEK_SET
- Returns:
Current position of the stream
- Return type:
int
- Raises:
S3Exception – An error occurred accessing S3.
- class s3torchconnector.S3Writer(stream: s3torchconnectorclient._mountpoint_s3_client.PutObjectStream)[source]
Bases:
io.BufferedIOBase
A write-only, file like representation of a single object stored in S3.
- stream
- write(data: bytes | memoryview) int [source]
Write bytes to S3 Object specified by bucket and key
- Parameters:
data (bytes | memoryview) – bytes to write
- Returns:
Number of bytes written
- Return type:
int
- Raises:
S3Exception – An error occurred accessing S3.
- class s3torchconnector.S3IterableDataset(region: str, get_dataset_objects: Callable[[s3torchconnector._s3client.S3Client], Iterable[s3torchconnector._s3bucket_key_data.S3BucketKeyData]], endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]
Bases:
torch.utils.data.IterableDataset
An IterableStyle dataset created from S3 objects.
To create an instance of S3IterableDataset, you need to use from_prefix or from_objects methods.
- property region
- property endpoint
- classmethod from_objects(object_uris: str | Iterable[str], *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]
Returns an instance of S3IterableDataset using the S3 URI(s) provided.
- Parameters:
object_uris (str | Iterable[str]) – S3 URI of the object(s) desired.
region (str) – AWS region of the S3 bucket where the objects are stored.
endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.
transform – Optional callable which is used to transform an S3Reader into the desired type.
s3client_config – Optional S3ClientConfig with parameters for S3 client.
enable_sharding – If True, shard the dataset across multiple workers for parallel data loading. If False (default), each worker loads the entire dataset independently.
- Returns:
An IterableStyle dataset created from S3 objects.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- classmethod from_prefix(s3_uri: str, *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]
Returns an instance of S3IterableDataset using the S3 URI provided.
- Parameters:
s3_uri (str) – An S3 URI (prefix) of the object(s) desired. Objects matching the prefix will be included in the returned dataset.
region (str) – AWS region of the S3 bucket where the objects are stored.
endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.
transform – Optional callable which is used to transform an S3Reader into the desired type.
s3client_config – Optional S3ClientConfig with parameters for S3 client.
enable_sharding – If True, shard the dataset across multiple workers for parallel data loading. If False (default), each worker loads the entire dataset independently.
- Returns:
An IterableStyle dataset created from S3 objects.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- class s3torchconnector.S3MapDataset(region: str, get_dataset_objects: Callable[[s3torchconnector._s3client.S3Client], Iterable[s3torchconnector._s3bucket_key_data.S3BucketKeyData]], endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None)[source]
Bases:
torch.utils.data.Dataset
A Map-Style dataset created from S3 objects.
To create an instance of S3MapDataset, you need to use from_prefix or from_objects methods.
- property region
- property endpoint
- classmethod from_objects(object_uris: str | Iterable[str], *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None)[source]
Returns an instance of S3MapDataset using the S3 URI(s) provided.
- Parameters:
object_uris (str | Iterable[str]) – S3 URI of the object(s) desired.
region (str) – AWS region of the S3 bucket where the objects are stored.
endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.
transform – Optional callable which is used to transform an S3Reader into the desired type.
s3client_config – Optional S3ClientConfig with parameters for S3 client.
- Returns:
A Map-Style dataset created from S3 objects.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- classmethod from_prefix(s3_uri: str, *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None)[source]
Returns an instance of S3MapDataset using the S3 URI provided.
- Parameters:
s3_uri (str) – An S3 URI (prefix) of the object(s) desired. Objects matching the prefix will be included in the returned dataset.
region (str) – AWS region of the S3 bucket where the objects are stored.
endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.
transform – Optional callable which is used to transform an S3Reader into the desired type.
s3client_config – Optional S3ClientConfig with parameters for S3 client.
- Returns:
A Map-Style dataset created from S3 objects.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- class s3torchconnector.S3Checkpoint(region: str, endpoint: str | None = None, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None)[source]
A checkpoint manager for S3.
To read a checkpoint from S3, users need to create an S3Reader by providing s3_uri of the checkpoint stored in S3. Similarly, to save a checkpoint to S3, users need to create an S3Writer by providing s3_uri. S3Reader and S3Writer implements io.BufferedIOBase therefore, they can be passed to torch.load, and torch.save.
- region
- endpoint = None
- reader(s3_uri: str) s3torchconnector.S3Reader [source]
Creates an S3Reader from a given s3_uri.
- Parameters:
s3_uri (str) – A valid s3_uri. (i.e. s3://<BUCKET>/<KEY>)
- Returns:
a read-only binary stream of the S3 object’s contents, specified by the s3_uri.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- writer(s3_uri: str) s3torchconnector.S3Writer [source]
Creates an S3Writer from a given s3_uri.
- Parameters:
s3_uri (str) – A valid s3_uri. (i.e. s3://<BUCKET>/<KEY>)
- Returns:
a write-only binary stream. The content is saved to S3 using the specified s3_uri.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- class s3torchconnector.S3ClientConfig[source]
A dataclass exposing configurable parameters for the S3 client.
Args: throughput_target_gbps(float): Throughput target in Gigabits per second (Gbps) that we are trying to reach.
10.0 Gbps by default (may change in future).
- part_size(int): Size (bytes) of file parts that will be uploaded/downloaded.
Note: for saving checkpoints, the inner client will adjust the part size to meet the service limits. (max number of parts per upload is 10,000, minimum upload part size is 5 MiB). Part size must have values between 5MiB and 5GiB. 8MiB by default (may change in future).
force_path_style(bool): forceful path style addressing for S3 client.
- throughput_target_gbps: float = 10.0
- part_size: int = 8388608
- unsigned: bool = False
- force_path_style: bool = False