Asset Providers¶

Every asset in a dataset is represented by a typed provider. When you call DatasetReader.get_asset, the library returns a specialised provider whose type matches the asset’s category:

Asset category	Returned type
Image	`ImageAssetProvider`
Text	`TextAssetProvider`
Data	`DataAssetProvider`
Graphics	`GraphicsAssetProvider`

All provider types share a common set of properties — key, title, description, media_type, roles, and asset_type — while each specialised type adds format-specific access methods (e.g. block-level image reads, text content, structured data parsing).

AssetProvider¶

AssetProvider is the base class exposing the common metadata properties. Use AssetProvider.from_bytes() to create an asset from raw bytes for writing (for image assets, use BufferedImageAssetProvider instead).

class aws.osml.io.AssetProvider¶

Bases: object

Base class for all asset types within a dataset.

An AssetProvider represents a single named asset inside a geospatial dataset. Every dataset opened through IO contains one or more assets, each identified by a unique key and categorised as image, text, data, or graphics. This class exposes the common properties shared by all asset types — key, title, description, MIME type, roles, and raw bytes — while specialised subclasses such as ImageAssetProvider, TextAssetProvider, DataAssetProvider, and GraphicsAssetProvider add format-specific access methods.

You typically obtain an AssetProvider by calling DatasetReader.get_asset(). To create an asset from raw bytes for writing, use the AssetProvider.from_bytes() static method.

Example:

```python from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:: keys = dataset.get_asset_keys(asset_type=”image”) asset = dataset.get_asset(keys[0]) print(asset.key, asset.title, asset.asset_type)

```

asset_type¶

image, text, data, or graphics.

Type:: Category of this asset

description¶: Detailed description of the asset.

static from_bytes(key, data, asset_type, title=None, description=None, roles=None, media_type=None)¶

Create a new AssetProvider from raw bytes.

Use this factory when you need to build an asset in memory for writing to a dataset via DatasetWriter.add_asset().

Parameters:

key (str) – Unique identifier for the asset.
data (bytes) – Raw bytes of the asset content.
asset_type (AssetType) – The type of asset (Image, Text, Graphics, Data).
title (str, optional) – Human-readable title. Defaults to key when omitted.
description (str, optional) – Detailed description. Defaults to empty.
roles (list[str], optional) – Semantic roles. Defaults to ["data"].
media_type (str, optional) – MIME type. Auto-detected from asset_type when omitted.

Returns:

A new asset that can be passed to DatasetWriter.add_asset().

Return type:

AssetProvider

Raises:

ValueError – If asset_type is AssetType.Image — use BufferedImageAssetProvider instead.

Example:

```python from aws.osml.io import AssetProvider, AssetType

asset = AssetProvider.from_bytes(: key=”my_text”, data=b”Hello, world!”, asset_type=AssetType.Text, title=”Greeting”,

)¶

key¶: Unique identifier for this asset within the dataset.

media_type¶: MIME type of the asset content (e.g. "application/vnd.nitf.image").

metadata¶: The MetadataProvider for this asset’s metadata.

raw_asset¶

The raw asset bytes as a BytesIO object.

Returns:: The raw bytes of the asset content.
Return type:: io.BytesIO
Raises:: IOError – If the asset data cannot be read.

Example:

`python data = asset.raw_asset.read() `

roles¶: Semantic roles assigned to this asset (e.g. ["data"], ["thumbnail"]).

title¶: Human-readable title for the asset.

ImageAssetProvider¶

Returned by get_asset() for image assets. Provides block-level and full-image read access.

class aws.osml.io.ImageAssetProvider¶

Bases: object

Provides blocked (tiled) access to the pixel data of an image asset.

Large geospatial images are divided into a regular grid of fixed-size rectangular blocks. ImageAssetProvider lets you read individual blocks as NumPy arrays without loading the entire image into memory. Use DatasetReader.get_asset() to obtain an instance for a specific image asset in the dataset.

All arrays returned by get_block() use a channels-first (CHW) layout with shape (bands, rows, cols). This matches the convention used by PyTorch and many deep learning pipelines. To convert to the channels-last (HWC) layout expected by OpenCV or Pillow, use np.transpose(block, (1, 2, 0)).

Example:

```python import numpy as np from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:

image = dataset.get_asset(“image:0”)

# Read an RGB composite from a multispectral image rgb = image.get_block(0, 0, resolution_level=0, bands=[3, 2, 1])

# Convert CHW to HWC for display with matplotlib or Pillow rgb_hwc = np.transpose(rgb, (1, 2, 0))

# Iterate over all blocks, skipping masked regions grid_rows, grid_cols = image.block_grid_size for row in range(grid_rows):

for col in range(grid_cols):

if image.has_block(row, col, resolution_level=0):
block = image.get_block(row, col, resolution_level=0)

```

actual_bits_per_pixel¶: Actual bits per pixel.

asset_type¶: Asset category.

block_grid_size¶: Number of blocks in each dimension as (rows, cols).

block_shape¶: Block dimensions as (bands, rows, columns) in CHW format.

codec_configuration()¶

Return opaque codec configuration for independent tile decoding.

The returned dictionary contains format-specific key-value pairs needed to decode tiles independently. For JPEG 2000 images this includes a "main_header" key whose value is the raw codestream main header bytes.

Returns None if no configuration is needed (e.g. uncompressed images).

Returns:: Codec parameters, or None.
Return type:: dict[str, bytes] | None

description¶: Detailed description of the asset.

get_block(block_row, block_col, resolution_level=0, bands=None)¶

Read a block of pixel data as a NumPy array.

Returns an ndarray with shape (bands, rows, cols) in channels-first (CHW) format. The NumPy dtype is selected automatically based on the image’s pixel_value_type.

Parameters:

block_row (int) – Row index in the block grid.
block_col (int) – Column index in the block grid.
resolution_level (int) – Resolution level (0 = full resolution).
bands (list[int], optional) – Zero-based band indices to retrieve. If None, all bands are returned.

Returns:

Pixel data with shape (bands, rows, cols).

Return type:

numpy.ndarray

Raises:

IndexError – If the block coordinates are out of bounds.
ValueError – If the resolution level is invalid.

Example:

```python # All bands at full resolution block = image.get_block(0, 0, resolution_level=0)

# Natural color from a multispectral image (R, G, B) rgb = image.get_block(0, 0, resolution_level=0, bands=[3, 2, 1])

# Near-infrared band for vegetation analysis nir = image.get_block(0, 0, resolution_level=0, bands=[4]) ```

has_block(block_row, block_col, resolution_level=0)¶

Check whether a block exists at the given grid coordinates.

Some formats (notably NITF) support masked (sparse) images where not every position in the block grid contains data. Use this method to skip empty regions when iterating over blocks.

Parameters:

block_row (int) – Row index in the block grid.
block_col (int) – Column index in the block grid.
resolution_level (int) – Resolution level (0 = full resolution).

Returns:

True if the block contains data, False otherwise.

Return type:

bool

Example:

```python grid_rows, grid_cols = image.block_grid_size for row in range(grid_rows):

for col in range(grid_cols):

if image.has_block(row, col, resolution_level=0):
block = image.get_block(row, col, resolution_level=0)

```

image_shape¶: Image dimensions as (bands, rows, columns) in CHW format.

key¶: Unique identifier for this asset within the dataset.

media_type¶: MIME type of the asset content.

metadata¶: Asset-level metadata as a MetadataProvider.

num_bands¶: Number of spectral bands.

num_bits_per_pixel¶: Nominal bits per pixel.

num_columns¶: Image width at full resolution in pixels.

num_pixels_per_block_horizontal¶: Block width in pixels.

num_pixels_per_block_vertical¶: Block height in pixels.

num_resolution_levels¶: Number of resolution levels in the image pyramid.

num_rows¶: Image height at full resolution in pixels.

pad_pixel_value¶: Value used for padding incomplete edge blocks.

pixel_value_type¶: Pixel data type.

raw_asset¶: Raw asset bytes as a BytesIO object.

roles¶: Semantic roles for this asset.

tile_byte_ranges()¶

Return per-tile byte ranges relative to the source file.

Returns a dictionary mapping (block_row, block_col) tuples to a list of (byte_offset, byte_length) tuples, where offsets are relative to the start of the source file. Each list contains one entry per tile-part; for most formats this is a single-element list.

Returns None for providers without a backing file (e.g. in-memory images created with BufferedImageAssetProvider).

Returns:: Mapping of tile coordinates to byte range lists, or None.
Return type:: dict[tuple[int, int], list[tuple[int, int]]] | None

title¶: Human-readable title for the asset.

BufferedImageAssetProvider¶

An in-memory image asset provider. Use this to create image assets for writing via add_asset().

class aws.osml.io.BufferedImageAssetProvider¶

Bases: object

Constructs image assets entirely in memory.

Use BufferedImageAssetProvider to create synthetic test data, assemble mosaics, or build images from processed results. The provider implements the same interface as ImageAssetProvider, so in-memory images can be passed to any API that accepts an image asset, including DatasetWriter.

All pixel arrays use a channels-first (CHW) layout with shape (bands, rows, cols). This matches the convention used by PyTorch and many deep learning pipelines. To convert to the channels-last (HWC) layout expected by OpenCV or Pillow, use np.transpose(array, (1, 2, 0)). To convert from HWC back to CHW, use np.transpose(array, (2, 0, 1)).

You can populate the image all at once with set_full_image() or set individual blocks with set_block() for large or sparse images. Optionally attach a BufferedMetadataProvider to supply encoding hints such as compression type (IC) and interleave mode (IMODE).

Example:

```python import numpy as np from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType

metadata = BufferedMetadataProvider() metadata.set(“IC”, “NC”) metadata.set(“IMODE”, “B”)

# Create a 512x512 RGB image with 256x256 blocks provider = BufferedImageAssetProvider.create(

key=”synthetic_image”, num_columns=512, num_rows=512, num_bands=3, block_width=256, block_height=256, pixel_type=PixelType.UInt8, metadata=metadata,

)

# Populate the full image at once image_data = np.random.randint(0, 255, (3, 512, 512), dtype=np.uint8) provider.set_full_image(image_data)

# Or set blocks individually for large/sparse images for row in range(2):

for col in range(2):
block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8) provider.set_block(row, col, block)

```

actual_bits_per_pixel¶: Actual bits per pixel.

asset_type¶: Asset category.

block_grid_size¶: Number of blocks in each dimension as (rows, cols).

block_shape¶: Block dimensions as (bands, rows, columns) in CHW format.

static create(key, num_columns=512, num_rows=512, num_bands=1, block_width=256, block_height=256, pixel_type=Ellipsis, num_bits_per_pixel=None, actual_bits_per_pixel=None, metadata=None, title=None, description=None)¶

Create a new in-memory image asset with the specified dimensions and pixel format.

Parameters:

key (str) – Unique identifier for this asset.
num_columns (int, optional) – Image width in pixels.
num_rows (int, optional) – Image height in pixels.
num_bands (int, optional) – Number of spectral bands.
block_width (int, optional) – Block width in pixels.
block_height (int, optional) – Block height in pixels.
pixel_type (PixelType, optional) – Pixel data type.
actual_bits_per_pixel (int, optional) – Actual bits per pixel, may be less than the nominal size. None uses the full range for the pixel type.
metadata (MetadataProvider, optional) – Encoding hints such as compression type (IC) and interleave mode (IMODE). See BufferedMetadataProvider.
title (str, optional) – Human-readable title. Auto-generated if omitted.
description (str, optional) – Detailed description. Auto-generated if omitted.

Returns:

A new in-memory image asset.

Return type:

BufferedImageAssetProvider

Example:

```python from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType

metadata = BufferedMetadataProvider() metadata.set(“IC”, “NC”) metadata.set(“IMODE”, “B”)

provider = BufferedImageAssetProvider.create(: key=”synthetic_image”, num_columns=512, num_rows=512, num_bands=3, pixel_type=PixelType.UInt8, metadata=metadata,

)¶

description¶: Detailed description of the asset.

static from_provider(provider, key=None, block_width=None, block_height=None, metadata=None)¶

Create a mutable copy of an existing ImageAssetProvider.

The returned BufferedImageAssetProvider lazily delegates get_block() calls to the source provider. Only blocks explicitly set via set_block() are stored in memory; all others are read on demand from the source. This enables copy-on-write semantics without loading the entire image into memory.

Because the returned provider holds a reference to the source, the source must remain open for the lifetime of the copy. If you need a fully independent snapshot, iterate over the blocks and call set_block() for each one.

Parameters:

provider (ImageAssetProvider | BufferedImageAssetProvider) – The source image asset to delegate to. Accepts ImageAssetProvider, BufferedImageAssetProvider, or any duck-typed object with the required image provider interface.
key (str, optional) – Optional new key for the copy. If None, the source key is reused.
block_width (int, optional) – Block width for the copy. If None, uses the source block width.
block_height (int, optional) – Block height for the copy. If None, uses the source block height.
metadata (MetadataProvider, optional) – Metadata for the copy. If None, copies the source metadata.

Returns:

A new mutable provider backed by the source.

Return type:

BufferedImageAssetProvider

Example:

```python from aws.osml.io import IO, BufferedImageAssetProvider

with IO.open([“input.ntf”], “r”) as reader:: source = reader.get_asset(“image:0”) copy = BufferedImageAssetProvider.from_provider(source) # Override specific blocks or metadata, then write

```

get_block(block_row, block_col, resolution_level=0, bands=None)¶

Read a block of pixel data as a NumPy array.

Returns an ndarray with shape (bands, rows, cols) in channels-first (CHW) format. The NumPy dtype is selected automatically based on the image’s pixel_value_type.

Parameters:

block_row (int) – Row index in the block grid.
block_col (int) – Column index in the block grid.
resolution_level (int) – Resolution level (0 = full resolution).
bands (list[int], optional) – Zero-based band indices to retrieve. If None, all bands are returned.

Returns:

Pixel data with shape (bands, rows, cols).

Return type:

numpy.ndarray

Raises:

IndexError – If the block coordinates are out of bounds.
ValueError – If the resolution level is invalid.

Example:

```python # Get full block with all bands block = provider.get_block(0, 0, 0) print(block.shape) # (3, 256, 256) for RGB with 256x256 blocks

# Get only the red channel (band 0) red_band = provider.get_block(0, 0, 0, bands=[0]) print(red_band.shape) # (1, 256, 256) ```

has_block(block_row, block_col, resolution_level=0)¶

Check whether a block exists at the given grid coordinates.

Parameters:

block_row (int) – Row index in the block grid.
block_col (int) – Column index in the block grid.
resolution_level (int) – Resolution level (0 = full resolution).

Returns:

True if the block contains data, False otherwise.

Return type:

bool

image_shape¶: Image dimensions as (bands, rows, columns) in CHW format.

irep¶: Image representation (MONO, RGB, MULTI, etc.).

key¶: Unique identifier for this asset within the dataset.

media_type¶: MIME type of the asset content.

metadata¶: Asset-level metadata as a MetadataProvider.

num_bands¶: Number of spectral bands.

num_bits_per_pixel¶: Nominal bits per pixel.

num_columns¶: Image width at full resolution in pixels.

num_pixels_per_block_horizontal¶: Block width in pixels.

num_pixels_per_block_vertical¶: Block height in pixels.

num_resolution_levels¶: Number of resolution levels in the image pyramid.

num_rows¶: Image height at full resolution in pixels.

pad_pixel_value¶: Value used for padding incomplete edge blocks.

pixel_value_type¶: Pixel data type.

raw_asset¶: Raw asset bytes as a BytesIO object.

roles¶: Semantic roles for this asset.

set_block(block_row, block_col, data)¶

Set pixel data for a single block at the given grid coordinates.

The array must use channels-first (CHW) layout with shape (bands, block_rows, block_cols). For large or sparse images, setting blocks individually avoids loading the full image into memory.

Parameters:

block_row (int) – Row index in the block grid (0-indexed).
block_col (int) – Column index in the block grid (0-indexed).
data (numpy.ndarray) – Pixel data with shape (bands, block_rows, block_cols). Supported dtypes: uint8, int8, uint16, int16, uint32, int32, float32, float64. The dtype should match the provider’s pixel_type.

Raises:

ValueError – If the array is not contiguous or block coordinates are out of range.
TypeError – If the array dtype is not supported.

Example:

```python import numpy as np

# Set blocks individually for a 1024x1024 image with 256x256 blocks for row in range(4):

for col in range(4):
block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8) provider.set_block(row, col, block)

```

set_full_image(data)¶

Set the full image data from a NumPy array.

The array must use channels-first (CHW) layout with shape (bands, rows, cols). The dimensions must match the values specified when the provider was created.

Parameters:

data (numpy.ndarray) – Pixel data with shape (bands, rows, cols). Supported dtypes: uint8, int8, uint16, int16, uint32, int32, float32, float64. The dtype should match the provider’s pixel_type.

Raises:

ValueError – If the array size does not match the image configuration (expected size = bands x rows x cols x bytes_per_pixel).
TypeError – If the array dtype is not supported.

Example:

```python import numpy as np

# Create RGB image data in CHW format image_data = np.zeros((3, 512, 512), dtype=np.uint8) image_data[0, :, :] = 255 # Red channel provider.set_full_image(image_data) ```

title¶: Human-readable title for the asset.

TextAssetProvider¶

Returned by get_asset() for text assets. Provides text content and encoding information.

class aws.osml.io.TextAssetProvider¶

Bases: object

Provides access to text content stored within a geospatial dataset.

Geospatial datasets can embed plain text alongside imagery — mission reports, processing notes, annotations, and similar human-readable data. TextAssetProvider exposes the decoded text through the text property, along with the character encoding and format metadata. Use DatasetReader.get_asset() to obtain an instance for a specific text asset in the dataset.

Example:

```python from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:

for key in dataset.get_asset_keys(asset_type=”text”):: text_asset = dataset.get_asset(key) print(f”Encoding: {text_asset.encoding}”) print(f”Format: {text_asset.format}”) print(f”Text: {text_asset.text[:200]}…”)

```

asset_type¶: The asset category.

description¶: A detailed description of the asset.

encoding¶: The character encoding of the text content (e.g., "UTF-8", "ASCII").

format¶: The text format identifier.

key¶: The unique identifier for this asset within the dataset.

media_type¶: The MIME type of the asset content.

metadata¶: The asset-level MetadataProvider.

raw_asset¶: The raw asset bytes as a BytesIO object.

roles¶: The semantic roles for this asset.

text¶

The decoded text content as a string.

Raises:: ValueError – If the text cannot be decoded using the asset’s character encoding.

title¶: A human-readable title for the asset.

BufferedTextAssetProvider¶

An in-memory text asset provider. Use this to create text assets for writing via add_asset().

class aws.osml.io.BufferedTextAssetProvider¶

Bases: object

Constructs text assets entirely in memory.

Use BufferedTextAssetProvider to create text content for inclusion in a dataset — mission reports, annotations, processing notes, and similar human-readable data. The provider implements the same interface as TextAssetProvider, so in-memory text assets can be passed to any API that accepts a text asset, including DatasetWriter.

Supported character encodings are "UTF-8", "ASCII", "ECS", and "MTF". You can optionally attach a title, description, and semantic roles to describe the text asset’s purpose within the dataset.

Example:

```python from aws.osml.io import BufferedTextAssetProvider

# Create a UTF-8 text asset provider = BufferedTextAssetProvider.create(

key=”text_0”, text_content=”Hello, World!”, encoding=”UTF-8”,

)

# Access text content and metadata print(provider.text) # “Hello, World!” print(provider.encoding) # “UTF-8” print(provider.format) # “U8S”

# Create a text asset with title, description, and roles provider = BufferedTextAssetProvider.create(

key=”text:0”, text_content=”Mission report content…”, encoding=”UTF-8”, title=”Mission Report”, description=”Operational text”, roles=[“data”, “annotation”],

)¶

asset_type¶: Asset category.

static create(key, text_content, encoding='UTF-8', title=None, description=None, roles=None, metadata=None)¶

Create a new in-memory text asset with the specified content and encoding.

Parameters:

key (str) – Unique identifier for this asset.
text_content (str) – The text content as a string.
encoding (str, optional) – Character encoding. Supported values are "UTF-8", "ASCII", "ECS", and "MTF". Defaults to "UTF-8".
title (str, optional) – Human-readable title for the asset.
description (str, optional) – Detailed description of the asset.
roles (list[str], optional) – Semantic roles describing the asset’s purpose. Defaults to ["data"] if not specified.
metadata (MetadataProvider, optional) – Additional metadata to attach to the asset.

Returns:

A new in-memory text asset.

Return type:

BufferedTextAssetProvider

Example:

```python from aws.osml.io import BufferedTextAssetProvider

provider = BufferedTextAssetProvider.create(: key=”text_0”, text_content=”Hello, World!”, encoding=”UTF-8”, title=”Sample Text”, description=”A sample text asset”,

)¶

description¶: Detailed description of the asset.

encoding¶: The character encoding of the text content (e.g., "UTF-8", "ASCII").

format¶: The text format identifier (e.g., "U8S", "STA").

key¶: Unique identifier for this asset within the dataset.

media_type¶: MIME type of the asset content.

metadata¶: Asset-level metadata as a MetadataProvider.

raw_asset¶

Raw asset bytes as a BytesIO object.

The raw bytes have CR/LF line delimiters as required by NITF.

roles¶: Semantic roles for this asset.

text¶: The decoded text content as a string.

title¶: Human-readable title for the asset.

GraphicsAssetProvider¶

Returned by get_asset() for graphics assets.

class aws.osml.io.GraphicsAssetProvider¶

Bases: object

Python wrapper for GraphicsAssetProvider trait objects.

This class provides access to graphics asset properties and content. Graphics data is accessed through the raw_asset property.

asset_type¶: The asset category.

description¶: A detailed description of the asset.

key¶: The unique identifier for this asset within the dataset.

media_type¶: The MIME type of the asset content.

metadata¶: The asset-level MetadataProvider.

raw_asset¶

The raw graphics bytes as a BytesIO object.

Returns the complete vector graphics payload (typically CGM format) wrapped in a BytesIO stream. Read the returned object to access the raw bytes for further processing or rendering.

Returns:: A seekable stream containing the raw graphics bytes.
Return type:: io.BytesIO
Raises:: IOError – If the graphics data cannot be read from the dataset.

roles¶: The semantic roles for this asset.

title¶: A human-readable title for the asset.

DataAssetProvider¶

Returned by get_asset() for structured data assets. Provides XML and JSON parsing methods.

class aws.osml.io.DataAssetProvider¶

Bases: object

Provides access to structured data stored within a geospatial dataset.

Geospatial datasets can embed structured payloads alongside imagery — XML metadata (such as SICD/SIDD), JSON configuration, overflow TREs, and application-specific data. DataAssetProvider exposes the raw bytes through raw_asset and the mime_type property indicates the content format. Use DatasetReader.get_asset() to obtain an instance for a specific data asset in the dataset.

Example:

```python import json import xml.etree.ElementTree as ET from aws.osml.io import IO

with IO.open([“sicd_image.ntf”], “r”) as dataset:

for key in dataset.get_asset_keys(asset_type=”data”):

data = dataset.get_asset(key) print(f”Data ‘{key}’: mime_type={data.mime_type}”)

raw = data.raw_asset.read() if data.mime_type == “application/xml”:

root = ET.fromstring(raw) print(f”XML root tag: {root.tag}”)

elif data.mime_type == “application/json”:: obj = json.loads(raw) print(f”JSON keys: {list(obj.keys())}”)

```

asset_type¶: The asset category.

description¶: A detailed description of the asset.

key¶: The unique identifier for this asset within the dataset.

media_type¶: The MIME type of the asset content.

metadata¶: The asset-level MetadataProvider.

mime_type¶: The MIME type of the data content (e.g., "application/xml", "application/json").

raw_asset¶: The raw asset bytes as a BytesIO object.

roles¶: The semantic roles for this asset.

title¶: A human-readable title for the asset.