s3torchconnector.s3iterable_dataset

Attributes

log

Classes

S3IterableDataset

An IterableStyle dataset created from S3 objects.

Module Contents

s3torchconnector.s3iterable_dataset.log[source]
class s3torchconnector.s3iterable_dataset.S3IterableDataset(region: str, get_dataset_objects: Callable[[s3torchconnector._s3client.S3Client], Iterable[s3torchconnector._s3bucket_key_data.S3BucketKeyData]], endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]

Bases: torch.utils.data.IterableDataset

An IterableStyle dataset created from S3 objects.

To create an instance of S3IterableDataset, you need to use from_prefix or from_objects methods.

property region[source]
property endpoint[source]
classmethod from_objects(object_uris: str | Iterable[str], *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]

Returns an instance of S3IterableDataset using the S3 URI(s) provided.

Parameters:
  • object_uris (str | Iterable[str]) – S3 URI of the object(s) desired.

  • region (str) – AWS region of the S3 bucket where the objects are stored.

  • endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.

  • transform – Optional callable which is used to transform an S3Reader into the desired type.

  • s3client_config – Optional S3ClientConfig with parameters for S3 client.

  • enable_sharding – If True, shard the dataset across multiple workers for parallel data loading. If False (default), each worker loads the entire dataset independently.

Returns:

An IterableStyle dataset created from S3 objects.

Return type:

S3IterableDataset

Raises:

S3Exception – An error occurred accessing S3.

classmethod from_prefix(s3_uri: str, *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]

Returns an instance of S3IterableDataset using the S3 URI provided.

Parameters:
  • s3_uri (str) – An S3 URI (prefix) of the object(s) desired. Objects matching the prefix will be included in the returned dataset.

  • region (str) – AWS region of the S3 bucket where the objects are stored.

  • endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.

  • transform – Optional callable which is used to transform an S3Reader into the desired type.

  • s3client_config – Optional S3ClientConfig with parameters for S3 client.

  • enable_sharding – If True, shard the dataset across multiple workers for parallel data loading. If False (default), each worker loads the entire dataset independently.

Returns:

An IterableStyle dataset created from S3 objects.

Return type:

S3IterableDataset

Raises:

S3Exception – An error occurred accessing S3.