s3torchconnector.s3iterable_dataset
Attributes
Classes
An IterableStyle dataset created from S3 objects. |
Module Contents
- class s3torchconnector.s3iterable_dataset.S3IterableDataset(region: str, get_dataset_objects: Callable[[s3torchconnector._s3client.S3Client], Iterable[s3torchconnector._s3bucket_key_data.S3BucketKeyData]], endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]
Bases:
torch.utils.data.IterableDataset
An IterableStyle dataset created from S3 objects.
To create an instance of S3IterableDataset, you need to use from_prefix or from_objects methods.
- classmethod from_objects(object_uris: str | Iterable[str], *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]
Returns an instance of S3IterableDataset using the S3 URI(s) provided.
- Parameters:
object_uris (str | Iterable[str]) – S3 URI of the object(s) desired.
region (str) – AWS region of the S3 bucket where the objects are stored.
endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.
transform – Optional callable which is used to transform an S3Reader into the desired type.
s3client_config – Optional S3ClientConfig with parameters for S3 client.
enable_sharding – If True, shard the dataset across multiple workers for parallel data loading. If False (default), each worker loads the entire dataset independently.
- Returns:
An IterableStyle dataset created from S3 objects.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.
- classmethod from_prefix(s3_uri: str, *, region: str, endpoint: str | None = None, transform: Callable[[s3torchconnector.S3Reader], Any] = identity, s3client_config: s3torchconnector._s3client.S3ClientConfig | None = None, enable_sharding: bool = False)[source]
Returns an instance of S3IterableDataset using the S3 URI provided.
- Parameters:
s3_uri (str) – An S3 URI (prefix) of the object(s) desired. Objects matching the prefix will be included in the returned dataset.
region (str) – AWS region of the S3 bucket where the objects are stored.
endpoint (str) – AWS endpoint of the S3 bucket where the objects are stored.
transform – Optional callable which is used to transform an S3Reader into the desired type.
s3client_config – Optional S3ClientConfig with parameters for S3 client.
enable_sharding – If True, shard the dataset across multiple workers for parallel data loading. If False (default), each worker loads the entire dataset independently.
- Returns:
An IterableStyle dataset created from S3 objects.
- Return type:
- Raises:
S3Exception – An error occurred accessing S3.