Asset Providers

Every asset in a dataset is represented by a typed provider. When you call DatasetReader.get_asset, the library returns a specialised provider whose type matches the asset’s category:

Asset category

Returned type

Image

ImageAssetProvider

Text

TextAssetProvider

Data

DataAssetProvider

Graphics

GraphicsAssetProvider

All provider types share a common set of properties — key, title, description, media_type, roles, and asset_type — while each specialised type adds format-specific access methods (e.g. block-level image reads, text content, structured data parsing).

AssetProvider

AssetProvider is the base class exposing the common metadata properties. Use AssetProvider.from_bytes() to create an asset from raw bytes for writing (for image assets, use BufferedImageAssetProvider instead).

class aws.osml.io.AssetProvider

Bases: object

Base class for all asset types within a dataset.

An AssetProvider represents a single named asset inside a geospatial dataset. Every dataset opened through IO contains one or more assets, each identified by a unique key and categorised as image, text, data, or graphics. This class exposes the common properties shared by all asset types — key, title, description, MIME type, roles, and raw bytes — while specialised subclasses such as ImageAssetProvider, TextAssetProvider, DataAssetProvider, and GraphicsAssetProvider add format-specific access methods.

You typically obtain an AssetProvider by calling DatasetReader.get_asset(). To create an asset from raw bytes for writing, use the AssetProvider.from_bytes() static method.

Example:

```python from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:

keys = dataset.get_asset_keys(asset_type=”image”) asset = dataset.get_asset(keys[0]) print(asset.key, asset.title, asset.asset_type)

```

asset_type

image, text, data, or graphics.

Type:

Category of this asset

description

Detailed description of the asset.

static from_bytes(key, data, asset_type, title=None, description=None, roles=None, media_type=None)

Create a new AssetProvider from raw bytes.

Use this factory when you need to build an asset in memory for writing to a dataset via DatasetWriter.add_asset().

Parameters:
  • key (str) – Unique identifier for the asset.

  • data (bytes) – Raw bytes of the asset content.

  • asset_type (AssetType) – The type of asset (Image, Text, Graphics, Data).

  • title (str, optional) – Human-readable title. Defaults to key when omitted.

  • description (str, optional) – Detailed description. Defaults to empty.

  • roles (list[str], optional) – Semantic roles. Defaults to ["data"].

  • media_type (str, optional) – MIME type. Auto-detected from asset_type when omitted.

Returns:

A new asset that can be passed to DatasetWriter.add_asset().

Return type:

AssetProvider

Raises:

ValueError – If asset_type is AssetType.Image — use BufferedImageAssetProvider instead.

Example:

```python from aws.osml.io import AssetProvider, AssetType

asset = AssetProvider.from_bytes(

key=”my_text”, data=b”Hello, world!”, asset_type=AssetType.Text, title=”Greeting”,

)

key

Unique identifier for this asset within the dataset.

media_type

MIME type of the asset content (e.g. "application/vnd.nitf.image").

metadata

The MetadataProvider for this asset’s metadata.

raw_asset

The raw asset bytes as a BytesIO object.

Returns:

The raw bytes of the asset content.

Return type:

io.BytesIO

Raises:

IOError – If the asset data cannot be read.

Example:

`python data = asset.raw_asset.read() `

roles

Semantic roles assigned to this asset (e.g. ["data"], ["thumbnail"]).

title

Human-readable title for the asset.

ImageAssetProvider

Returned by get_asset() for image assets. Provides block-level and full-image read access.

class aws.osml.io.ImageAssetProvider

Bases: object

Provides blocked (tiled) access to the pixel data of an image asset.

Large geospatial images are divided into a regular grid of fixed-size rectangular blocks. ImageAssetProvider lets you read individual blocks as NumPy arrays without loading the entire image into memory. Use DatasetReader.get_asset() to obtain an instance for a specific image asset in the dataset.

All arrays returned by get_block() use a channels-first (CHW) layout with shape (bands, rows, cols). This matches the convention used by PyTorch and many deep learning pipelines. To convert to the channels-last (HWC) layout expected by OpenCV or Pillow, use np.transpose(block, (1, 2, 0)).

Example:

```python import numpy as np from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:

image = dataset.get_asset(“image:0”)

# Read an RGB composite from a multispectral image rgb = image.get_block(0, 0, resolution_level=0, bands=[3, 2, 1])

# Convert CHW to HWC for display with matplotlib or Pillow rgb_hwc = np.transpose(rgb, (1, 2, 0))

# Iterate over all blocks, skipping masked regions grid_rows, grid_cols = image.block_grid_size for row in range(grid_rows):

for col in range(grid_cols):
if image.has_block(row, col, resolution_level=0):

block = image.get_block(row, col, resolution_level=0)

```

actual_bits_per_pixel

Actual bits per pixel.

asset_type

Asset category.

block_grid_size

Number of blocks in each dimension as (rows, cols).

block_shape

Block dimensions as (bands, rows, columns) in CHW format.

codec_configuration()

Return opaque codec configuration for independent tile decoding.

The returned dictionary contains format-specific key-value pairs needed to decode tiles independently. For JPEG 2000 images this includes a "main_header" key whose value is the raw codestream main header bytes.

Returns None if no configuration is needed (e.g. uncompressed images).

Returns:

Codec parameters, or None.

Return type:

dict[str, bytes] | None

description

Detailed description of the asset.

get_block(block_row, block_col, resolution_level=0, bands=None)

Read a block of pixel data as a NumPy array.

Returns an ndarray with shape (bands, rows, cols) in channels-first (CHW) format. The NumPy dtype is selected automatically based on the image’s pixel_value_type.

Parameters:
  • block_row (int) – Row index in the block grid.

  • block_col (int) – Column index in the block grid.

  • resolution_level (int) – Resolution level (0 = full resolution).

  • bands (list[int], optional) – Zero-based band indices to retrieve. If None, all bands are returned.

Returns:

Pixel data with shape (bands, rows, cols).

Return type:

numpy.ndarray

Raises:
  • IndexError – If the block coordinates are out of bounds.

  • ValueError – If the resolution level is invalid.

Example:

```python # All bands at full resolution block = image.get_block(0, 0, resolution_level=0)

# Natural color from a multispectral image (R, G, B) rgb = image.get_block(0, 0, resolution_level=0, bands=[3, 2, 1])

# Near-infrared band for vegetation analysis nir = image.get_block(0, 0, resolution_level=0, bands=[4]) ```

has_block(block_row, block_col, resolution_level=0)

Check whether a block exists at the given grid coordinates.

Some formats (notably NITF) support masked (sparse) images where not every position in the block grid contains data. Use this method to skip empty regions when iterating over blocks.

Parameters:
  • block_row (int) – Row index in the block grid.

  • block_col (int) – Column index in the block grid.

  • resolution_level (int) – Resolution level (0 = full resolution).

Returns:

True if the block contains data, False otherwise.

Return type:

bool

Example:

```python grid_rows, grid_cols = image.block_grid_size for row in range(grid_rows):

for col in range(grid_cols):
if image.has_block(row, col, resolution_level=0):

block = image.get_block(row, col, resolution_level=0)

```

image_shape

Image dimensions as (bands, rows, columns) in CHW format.

key

Unique identifier for this asset within the dataset.

media_type

MIME type of the asset content.

metadata

Asset-level metadata as a MetadataProvider.

num_bands

Number of spectral bands.

num_bits_per_pixel

Nominal bits per pixel.

num_columns

Image width at full resolution in pixels.

num_pixels_per_block_horizontal

Block width in pixels.

num_pixels_per_block_vertical

Block height in pixels.

num_resolution_levels

Number of resolution levels in the image pyramid.

num_rows

Image height at full resolution in pixels.

pad_pixel_value

Value used for padding incomplete edge blocks.

pixel_value_type

Pixel data type.

raw_asset

Raw asset bytes as a BytesIO object.

roles

Semantic roles for this asset.

tile_byte_ranges()

Return per-tile byte ranges relative to the source file.

Returns a dictionary mapping (block_row, block_col) tuples to a list of (byte_offset, byte_length) tuples, where offsets are relative to the start of the source file. Each list contains one entry per tile-part; for most formats this is a single-element list.

Returns None for providers without a backing file (e.g. in-memory images created with BufferedImageAssetProvider).

Returns:

Mapping of tile coordinates to byte range lists, or None.

Return type:

dict[tuple[int, int], list[tuple[int, int]]] | None

title

Human-readable title for the asset.

BufferedImageAssetProvider

An in-memory image asset provider. Use this to create image assets for writing via add_asset().

class aws.osml.io.BufferedImageAssetProvider

Bases: object

Constructs image assets entirely in memory.

Use BufferedImageAssetProvider to create synthetic test data, assemble mosaics, or build images from processed results. The provider implements the same interface as ImageAssetProvider, so in-memory images can be passed to any API that accepts an image asset, including DatasetWriter.

All pixel arrays use a channels-first (CHW) layout with shape (bands, rows, cols). This matches the convention used by PyTorch and many deep learning pipelines. To convert to the channels-last (HWC) layout expected by OpenCV or Pillow, use np.transpose(array, (1, 2, 0)). To convert from HWC back to CHW, use np.transpose(array, (2, 0, 1)).

You can populate the image all at once with set_full_image() or set individual blocks with set_block() for large or sparse images. Optionally attach a BufferedMetadataProvider to supply encoding hints such as compression type (IC) and interleave mode (IMODE).

Example:

```python import numpy as np from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType

metadata = BufferedMetadataProvider() metadata.set(“IC”, “NC”) metadata.set(“IMODE”, “B”)

# Create a 512x512 RGB image with 256x256 blocks provider = BufferedImageAssetProvider.create(

key=”synthetic_image”, num_columns=512, num_rows=512, num_bands=3, block_width=256, block_height=256, pixel_type=PixelType.UInt8, metadata=metadata,

)

# Populate the full image at once image_data = np.random.randint(0, 255, (3, 512, 512), dtype=np.uint8) provider.set_full_image(image_data)

# Or set blocks individually for large/sparse images for row in range(2):

for col in range(2):

block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8) provider.set_block(row, col, block)

```

actual_bits_per_pixel

Actual bits per pixel.

asset_type

Asset category.

block_grid_size

Number of blocks in each dimension as (rows, cols).

block_shape

Block dimensions as (bands, rows, columns) in CHW format.

static create(key, num_columns=512, num_rows=512, num_bands=1, block_width=256, block_height=256, pixel_type=Ellipsis, num_bits_per_pixel=None, actual_bits_per_pixel=None, metadata=None, title=None, description=None)

Create a new in-memory image asset with the specified dimensions and pixel format.

Parameters:
  • key (str) – Unique identifier for this asset.

  • num_columns (int, optional) – Image width in pixels.

  • num_rows (int, optional) – Image height in pixels.

  • num_bands (int, optional) – Number of spectral bands.

  • block_width (int, optional) – Block width in pixels.

  • block_height (int, optional) – Block height in pixels.

  • pixel_type (PixelType, optional) – Pixel data type.

  • actual_bits_per_pixel (int, optional) – Actual bits per pixel, may be less than the nominal size. None uses the full range for the pixel type.

  • metadata (MetadataProvider, optional) – Encoding hints such as compression type (IC) and interleave mode (IMODE). See BufferedMetadataProvider.

  • title (str, optional) – Human-readable title. Auto-generated if omitted.

  • description (str, optional) – Detailed description. Auto-generated if omitted.

Returns:

A new in-memory image asset.

Return type:

BufferedImageAssetProvider

Example:

```python from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType

metadata = BufferedMetadataProvider() metadata.set(“IC”, “NC”) metadata.set(“IMODE”, “B”)

provider = BufferedImageAssetProvider.create(

key=”synthetic_image”, num_columns=512, num_rows=512, num_bands=3, pixel_type=PixelType.UInt8, metadata=metadata,

)

description

Detailed description of the asset.

static from_provider(provider, key=None, block_width=None, block_height=None, metadata=None)

Create a mutable copy of an existing ImageAssetProvider.

The returned BufferedImageAssetProvider lazily delegates get_block() calls to the source provider. Only blocks explicitly set via set_block() are stored in memory; all others are read on demand from the source. This enables copy-on-write semantics without loading the entire image into memory.

Because the returned provider holds a reference to the source, the source must remain open for the lifetime of the copy. If you need a fully independent snapshot, iterate over the blocks and call set_block() for each one.

Parameters:
  • provider (ImageAssetProvider | BufferedImageAssetProvider) – The source image asset to delegate to. Accepts ImageAssetProvider, BufferedImageAssetProvider, or any duck-typed object with the required image provider interface.

  • key (str, optional) – Optional new key for the copy. If None, the source key is reused.

  • block_width (int, optional) – Block width for the copy. If None, uses the source block width.

  • block_height (int, optional) – Block height for the copy. If None, uses the source block height.

  • metadata (MetadataProvider, optional) – Metadata for the copy. If None, copies the source metadata.

Returns:

A new mutable provider backed by the source.

Return type:

BufferedImageAssetProvider

Example:

```python from aws.osml.io import IO, BufferedImageAssetProvider

with IO.open([“input.ntf”], “r”) as reader:

source = reader.get_asset(“image:0”) copy = BufferedImageAssetProvider.from_provider(source) # Override specific blocks or metadata, then write

```

get_block(block_row, block_col, resolution_level=0, bands=None)

Read a block of pixel data as a NumPy array.

Returns an ndarray with shape (bands, rows, cols) in channels-first (CHW) format. The NumPy dtype is selected automatically based on the image’s pixel_value_type.

Parameters:
  • block_row (int) – Row index in the block grid.

  • block_col (int) – Column index in the block grid.

  • resolution_level (int) – Resolution level (0 = full resolution).

  • bands (list[int], optional) – Zero-based band indices to retrieve. If None, all bands are returned.

Returns:

Pixel data with shape (bands, rows, cols).

Return type:

numpy.ndarray

Raises:
  • IndexError – If the block coordinates are out of bounds.

  • ValueError – If the resolution level is invalid.

Example:

```python # Get full block with all bands block = provider.get_block(0, 0, 0) print(block.shape) # (3, 256, 256) for RGB with 256x256 blocks

# Get only the red channel (band 0) red_band = provider.get_block(0, 0, 0, bands=[0]) print(red_band.shape) # (1, 256, 256) ```

has_block(block_row, block_col, resolution_level=0)

Check whether a block exists at the given grid coordinates.

Parameters:
  • block_row (int) – Row index in the block grid.

  • block_col (int) – Column index in the block grid.

  • resolution_level (int) – Resolution level (0 = full resolution).

Returns:

True if the block contains data, False otherwise.

Return type:

bool

image_shape

Image dimensions as (bands, rows, columns) in CHW format.

irep

Image representation (MONO, RGB, MULTI, etc.).

key

Unique identifier for this asset within the dataset.

media_type

MIME type of the asset content.

metadata

Asset-level metadata as a MetadataProvider.

num_bands

Number of spectral bands.

num_bits_per_pixel

Nominal bits per pixel.

num_columns

Image width at full resolution in pixels.

num_pixels_per_block_horizontal

Block width in pixels.

num_pixels_per_block_vertical

Block height in pixels.

num_resolution_levels

Number of resolution levels in the image pyramid.

num_rows

Image height at full resolution in pixels.

pad_pixel_value

Value used for padding incomplete edge blocks.

pixel_value_type

Pixel data type.

raw_asset

Raw asset bytes as a BytesIO object.

roles

Semantic roles for this asset.

set_block(block_row, block_col, data)

Set pixel data for a single block at the given grid coordinates.

The array must use channels-first (CHW) layout with shape (bands, block_rows, block_cols). For large or sparse images, setting blocks individually avoids loading the full image into memory.

Parameters:
  • block_row (int) – Row index in the block grid (0-indexed).

  • block_col (int) – Column index in the block grid (0-indexed).

  • data (numpy.ndarray) – Pixel data with shape (bands, block_rows, block_cols). Supported dtypes: uint8, int8, uint16, int16, uint32, int32, float32, float64. The dtype should match the provider’s pixel_type.

Raises:
  • ValueError – If the array is not contiguous or block coordinates are out of range.

  • TypeError – If the array dtype is not supported.

Example:

```python import numpy as np

# Set blocks individually for a 1024x1024 image with 256x256 blocks for row in range(4):

for col in range(4):

block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8) provider.set_block(row, col, block)

```

set_full_image(data)

Set the full image data from a NumPy array.

The array must use channels-first (CHW) layout with shape (bands, rows, cols). The dimensions must match the values specified when the provider was created.

Parameters:

data (numpy.ndarray) – Pixel data with shape (bands, rows, cols). Supported dtypes: uint8, int8, uint16, int16, uint32, int32, float32, float64. The dtype should match the provider’s pixel_type.

Raises:
  • ValueError – If the array size does not match the image configuration (expected size = bands x rows x cols x bytes_per_pixel).

  • TypeError – If the array dtype is not supported.

Example:

```python import numpy as np

# Create RGB image data in CHW format image_data = np.zeros((3, 512, 512), dtype=np.uint8) image_data[0, :, :] = 255 # Red channel provider.set_full_image(image_data) ```

title

Human-readable title for the asset.

TextAssetProvider

Returned by get_asset() for text assets. Provides text content and encoding information.

class aws.osml.io.TextAssetProvider

Bases: object

Provides access to text content stored within a geospatial dataset.

Geospatial datasets can embed plain text alongside imagery — mission reports, processing notes, annotations, and similar human-readable data. TextAssetProvider exposes the decoded text through the text property, along with the character encoding and format metadata. Use DatasetReader.get_asset() to obtain an instance for a specific text asset in the dataset.

Example:

```python from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:
for key in dataset.get_asset_keys(asset_type=”text”):

text_asset = dataset.get_asset(key) print(f”Encoding: {text_asset.encoding}”) print(f”Format: {text_asset.format}”) print(f”Text: {text_asset.text[:200]}…”)

```

asset_type

The asset category.

description

A detailed description of the asset.

encoding

The character encoding of the text content (e.g., "UTF-8", "ASCII").

format

The text format identifier.

key

The unique identifier for this asset within the dataset.

media_type

The MIME type of the asset content.

metadata

The asset-level MetadataProvider.

raw_asset

The raw asset bytes as a BytesIO object.

roles

The semantic roles for this asset.

text

The decoded text content as a string.

Raises:

ValueError – If the text cannot be decoded using the asset’s character encoding.

title

A human-readable title for the asset.

BufferedTextAssetProvider

An in-memory text asset provider. Use this to create text assets for writing via add_asset().

class aws.osml.io.BufferedTextAssetProvider

Bases: object

Constructs text assets entirely in memory.

Use BufferedTextAssetProvider to create text content for inclusion in a dataset — mission reports, annotations, processing notes, and similar human-readable data. The provider implements the same interface as TextAssetProvider, so in-memory text assets can be passed to any API that accepts a text asset, including DatasetWriter.

Supported character encodings are "UTF-8", "ASCII", "ECS", and "MTF". You can optionally attach a title, description, and semantic roles to describe the text asset’s purpose within the dataset.

Example:

```python from aws.osml.io import BufferedTextAssetProvider

# Create a UTF-8 text asset provider = BufferedTextAssetProvider.create(

key=”text_0”, text_content=”Hello, World!”, encoding=”UTF-8”,

)

# Access text content and metadata print(provider.text) # “Hello, World!” print(provider.encoding) # “UTF-8” print(provider.format) # “U8S”

# Create a text asset with title, description, and roles provider = BufferedTextAssetProvider.create(

key=”text:0”, text_content=”Mission report content…”, encoding=”UTF-8”, title=”Mission Report”, description=”Operational text”, roles=[“data”, “annotation”],

)

asset_type

Asset category.

static create(key, text_content, encoding='UTF-8', title=None, description=None, roles=None, metadata=None)

Create a new in-memory text asset with the specified content and encoding.

Parameters:
  • key (str) – Unique identifier for this asset.

  • text_content (str) – The text content as a string.

  • encoding (str, optional) – Character encoding. Supported values are "UTF-8", "ASCII", "ECS", and "MTF". Defaults to "UTF-8".

  • title (str, optional) – Human-readable title for the asset.

  • description (str, optional) – Detailed description of the asset.

  • roles (list[str], optional) – Semantic roles describing the asset’s purpose. Defaults to ["data"] if not specified.

  • metadata (MetadataProvider, optional) – Additional metadata to attach to the asset.

Returns:

A new in-memory text asset.

Return type:

BufferedTextAssetProvider

Example:

```python from aws.osml.io import BufferedTextAssetProvider

provider = BufferedTextAssetProvider.create(

key=”text_0”, text_content=”Hello, World!”, encoding=”UTF-8”, title=”Sample Text”, description=”A sample text asset”,

)

description

Detailed description of the asset.

encoding

The character encoding of the text content (e.g., "UTF-8", "ASCII").

format

The text format identifier (e.g., "U8S", "STA").

key

Unique identifier for this asset within the dataset.

media_type

MIME type of the asset content.

metadata

Asset-level metadata as a MetadataProvider.

raw_asset

Raw asset bytes as a BytesIO object.

The raw bytes have CR/LF line delimiters as required by NITF.

roles

Semantic roles for this asset.

text

The decoded text content as a string.

title

Human-readable title for the asset.

GraphicsAssetProvider

Returned by get_asset() for graphics assets.

class aws.osml.io.GraphicsAssetProvider

Bases: object

Python wrapper for GraphicsAssetProvider trait objects.

This class provides access to graphics asset properties and content. Graphics data is accessed through the raw_asset property.

asset_type

The asset category.

description

A detailed description of the asset.

key

The unique identifier for this asset within the dataset.

media_type

The MIME type of the asset content.

metadata

The asset-level MetadataProvider.

raw_asset

The raw graphics bytes as a BytesIO object.

Returns the complete vector graphics payload (typically CGM format) wrapped in a BytesIO stream. Read the returned object to access the raw bytes for further processing or rendering.

Returns:

A seekable stream containing the raw graphics bytes.

Return type:

io.BytesIO

Raises:

IOError – If the graphics data cannot be read from the dataset.

roles

The semantic roles for this asset.

title

A human-readable title for the asset.

DataAssetProvider

Returned by get_asset() for structured data assets. Provides XML and JSON parsing methods.

class aws.osml.io.DataAssetProvider

Bases: object

Provides access to structured data stored within a geospatial dataset.

Geospatial datasets can embed structured payloads alongside imagery — XML metadata (such as SICD/SIDD), JSON configuration, overflow TREs, and application-specific data. DataAssetProvider exposes the raw bytes through raw_asset and the mime_type property indicates the content format. Use DatasetReader.get_asset() to obtain an instance for a specific data asset in the dataset.

Example:

```python import json import xml.etree.ElementTree as ET from aws.osml.io import IO

with IO.open([“sicd_image.ntf”], “r”) as dataset:
for key in dataset.get_asset_keys(asset_type=”data”):

data = dataset.get_asset(key) print(f”Data ‘{key}’: mime_type={data.mime_type}”)

raw = data.raw_asset.read() if data.mime_type == “application/xml”:

root = ET.fromstring(raw) print(f”XML root tag: {root.tag}”)

elif data.mime_type == “application/json”:

obj = json.loads(raw) print(f”JSON keys: {list(obj.keys())}”)

```

asset_type

The asset category.

description

A detailed description of the asset.

key

The unique identifier for this asset within the dataset.

media_type

The MIME type of the asset content.

metadata

The asset-level MetadataProvider.

mime_type

The MIME type of the data content (e.g., "application/xml", "application/json").

raw_asset

The raw asset bytes as a BytesIO object.

roles

The semantic roles for this asset.

title

A human-readable title for the asset.