IO Factory

class aws.osml.io.IO

Bases: object

Entry point for opening geospatial datasets for reading or writing.

The IO class provides a single static method, open, that accepts a file path string, a list of file paths, a file-like object (stream), or a list of file-like objects, and returns either a DatasetReader or a DatasetWriter depending on the requested mode. The file format is auto-detected from the extension and file header bytes when reading from paths; when reading from a stream, the format parameter must be specified explicitly. Both local file paths and file:// URIs are supported.

Example:

```python from aws.osml.io import IO

# Read mode — single string path (format auto-detected) with IO.open(“image.ntf”, “r”) as dataset:

keys = dataset.get_asset_keys() asset = dataset.get_asset(keys[0])

# Read mode — from an in-memory byte buffer import io with IO.open(io.BytesIO(raw_bytes), “r”, format=”png”) as dataset:

keys = dataset.get_asset_keys()

# Write mode — returns a DatasetWriter with IO.open(“output.ntf”, “w”, “nitf”) as writer:

writer.add_asset(“image”, provider, “Title”, “Description”, [“data”])

```

static open(paths, mode='r', format=None, roles=None)

Open a dataset for reading or writing.

The format is auto-detected from the file extension when reading from a file path. When writing to a file, a format string is inferred from the extension or may be provided explicitly. When reading from or writing to a file-like object (stream), the format parameter is required since there is no filename to inspect. Use a context manager (with statement) on the returned object to ensure file handles are released.

Parameters:
  • paths (str | list[str] | BinaryIO | list[BinaryIO]) – A file path, list of file paths, file-like object, or list of file-like objects. For single-file formats a bare string is accepted ("image.ntf"). For multi-file R-set datasets a list is required. File-like objects must implement .read() for read mode and .write() + .flush() for write mode (e.g., io.BytesIO, fsspec file handles). Accepts local paths, file:// URIs, and s3:// URIs.

  • mode (str) – "r" for reading or "w" for writing. Defaults to "r".

  • format (str or None) – Format identifier (e.g., "nitf", "geotiff", "png"). Required when paths is a stream or list of streams. Required when writing to a file with an unrecognized extension. Optional otherwise.

  • roles (list[str] or list[list[str]] or None) – Explicit role strings for each source. list[str] when paths is a single source, list[list[str]] when paths is a list. Recognised roles: "data" designates the base source; "overview:N" (N >= 1) designates an R-set overview at resolution level N. roles is required when paths is a list of streams (no filename to derive roles from). For a list of file paths, roles is optional; if omitted, the library falls back to .rN filename detection for backward compatibility.

Returns:

A DatasetReader when mode is "r", or a DatasetWriter when mode is "w".

Return type:

DatasetReader or DatasetWriter

Raises:
  • ValueError – If paths is empty, the mode is invalid, the file format is not supported, or format/roles is missing when required.

  • TypeError – If paths has an invalid type, or a file-like object is missing the required methods.

  • IOError – If the file cannot be opened.

Note

When reading from a stream, the entire content is loaded into memory via .read(). For large files (multi-GB NITF) this is significantly more expensive than the memory-mapped file path. Consider downloading large files to the local filesystem, or using the library’s VirtualiZarr-based tile index for cloud-native range-read access.

Example:

```python from aws.osml.io import IO import io

# Read mode — single string path with IO.open(“image.ntf”, “r”) as dataset:

print(type(dataset)) # DatasetReader

# Read mode — list of paths (R-set, .rN detection) with IO.open([“image.ntf”, “image.ntf.r1”], “r”) as dataset:

print(type(dataset)) # DatasetReader

# Read mode — list of streams with explicit roles streams = [open(“image.ntf”, “rb”), open(“image.ntf.r1”, “rb”)] with IO.open(streams, “r”, format=”nitf”,

roles=[[“data”], [“overview:1”]]) as dataset:

print(type(dataset)) # DatasetReader

# Write mode — to an in-memory buffer buf = io.BytesIO() with IO.open(buf, “w”, “png”) as writer:

writer.add_asset(“image”, provider, “Title”, “Description”, [“data”])

encoded_bytes = buf.getvalue() ```