Asset Providers¶
Every asset in a dataset is represented by a typed provider. When you call
DatasetReader.get_asset, the
library returns a specialised provider whose type matches the asset’s category:
Asset category |
Returned type |
|---|---|
Image |
|
Text |
|
Data |
|
Graphics |
All provider types share a common set of properties — key, title,
description, media_type, roles, and asset_type — while each
specialised type adds format-specific access methods (e.g. block-level image
reads, text content, structured data parsing).
AssetProvider¶
AssetProvider is the base class exposing the common metadata properties.
Use AssetProvider.from_bytes() to create an asset from raw bytes for
writing (for image assets, use BufferedImageAssetProvider
instead).
- class aws.osml.io.AssetProvider¶
Bases:
objectBase class for all asset types within a dataset.
An
AssetProviderrepresents a single named asset inside a geospatial dataset. Every dataset opened throughIOcontains one or more assets, each identified by a unique key and categorised as image, text, data, or graphics. This class exposes the common properties shared by all asset types — key, title, description, MIME type, roles, and raw bytes — while specialised subclasses such asImageAssetProvider,TextAssetProvider,DataAssetProvider, andGraphicsAssetProvideradd format-specific access methods.You typically obtain an
AssetProviderby callingDatasetReader.get_asset(). To create an asset from raw bytes for writing, use theAssetProvider.from_bytes()static method.Example:
```python from aws.osml.io import IO
- with IO.open([“image.ntf”], “r”) as dataset:
keys = dataset.get_asset_keys(asset_type=”image”) asset = dataset.get_asset(keys[0]) print(asset.key, asset.title, asset.asset_type)
- asset_type¶
image, text, data, or graphics.
- Type:
Category of this asset
- description¶
Detailed description of the asset.
- static from_bytes(key, data, asset_type, title=None, description=None, roles=None, media_type=None)¶
Create a new
AssetProviderfrom raw bytes.Use this factory when you need to build an asset in memory for writing to a dataset via
DatasetWriter.add_asset().- Parameters:
key (str) – Unique identifier for the asset.
data (bytes) – Raw bytes of the asset content.
asset_type (AssetType) – The type of asset (Image, Text, Graphics, Data).
title (str, optional) – Human-readable title. Defaults to key when omitted.
description (str, optional) – Detailed description. Defaults to empty.
roles (list[str], optional) – Semantic roles. Defaults to
["data"].media_type (str, optional) – MIME type. Auto-detected from asset_type when omitted.
- Returns:
A new asset that can be passed to
DatasetWriter.add_asset().- Return type:
- Raises:
ValueError – If asset_type is
AssetType.Image— useBufferedImageAssetProviderinstead.
Example:
```python from aws.osml.io import AssetProvider, AssetType
- asset = AssetProvider.from_bytes(
key=”my_text”, data=b”Hello, world!”, asset_type=AssetType.Text, title=”Greeting”,
)¶
- key¶
Unique identifier for this asset within the dataset.
- media_type¶
MIME type of the asset content (e.g.
"application/vnd.nitf.image").
- metadata¶
The
MetadataProviderfor this asset’s metadata.
- raw_asset¶
The raw asset bytes as a
BytesIOobject.- Returns:
The raw bytes of the asset content.
- Return type:
- Raises:
IOError – If the asset data cannot be read.
Example:
`python data = asset.raw_asset.read() `
- roles¶
Semantic roles assigned to this asset (e.g.
["data"],["thumbnail"]).
- title¶
Human-readable title for the asset.
ImageAssetProvider¶
Returned by get_asset() for image assets.
Provides block-level and full-image read access.
- class aws.osml.io.ImageAssetProvider¶
Bases:
objectProvides blocked (tiled) access to the pixel data of an image asset.
Large geospatial images are divided into a regular grid of fixed-size rectangular blocks.
ImageAssetProviderlets you read individual blocks as NumPy arrays without loading the entire image into memory. UseDatasetReader.get_asset()to obtain an instance for a specific image asset in the dataset.All arrays returned by
get_block()use a channels-first (CHW) layout with shape(bands, rows, cols). This matches the convention used by PyTorch and many deep learning pipelines. To convert to the channels-last (HWC) layout expected by OpenCV or Pillow, usenp.transpose(block, (1, 2, 0)).Example:
```python import numpy as np from aws.osml.io import IO
- with IO.open([“image.ntf”], “r”) as dataset:
image = dataset.get_asset(“image:0”)
# Read an RGB composite from a multispectral image rgb = image.get_block(0, 0, resolution_level=0, bands=[3, 2, 1])
# Convert CHW to HWC for display with matplotlib or Pillow rgb_hwc = np.transpose(rgb, (1, 2, 0))
# Iterate over all blocks, skipping masked regions grid_rows, grid_cols = image.block_grid_size for row in range(grid_rows):
- for col in range(grid_cols):
- if image.has_block(row, col, resolution_level=0):
block = image.get_block(row, col, resolution_level=0)
- actual_bits_per_pixel¶
Actual bits per pixel.
- asset_type¶
Asset category.
- block_grid_size¶
Number of blocks in each dimension as
(rows, cols).
- block_shape¶
Block dimensions as
(bands, rows, columns)in CHW format.
- codec_configuration()¶
Return opaque codec configuration for independent tile decoding.
The returned dictionary contains format-specific key-value pairs needed to decode tiles independently. For JPEG 2000 images this includes a
"main_header"key whose value is the raw codestream main header bytes.Returns
Noneif no configuration is needed (e.g. uncompressed images).
- description¶
Detailed description of the asset.
- get_block(block_row, block_col, resolution_level=0, bands=None)¶
Read a block of pixel data as a NumPy array.
Returns an
ndarraywith shape(bands, rows, cols)in channels-first (CHW) format. The NumPy dtype is selected automatically based on the image’spixel_value_type.- Parameters:
- Returns:
Pixel data with shape
(bands, rows, cols).- Return type:
- Raises:
IndexError – If the block coordinates are out of bounds.
ValueError – If the resolution level is invalid.
Example:
```python # All bands at full resolution block = image.get_block(0, 0, resolution_level=0)
# Natural color from a multispectral image (R, G, B) rgb = image.get_block(0, 0, resolution_level=0, bands=[3, 2, 1])
# Near-infrared band for vegetation analysis nir = image.get_block(0, 0, resolution_level=0, bands=[4]) ```
- has_block(block_row, block_col, resolution_level=0)¶
Check whether a block exists at the given grid coordinates.
Some formats (notably NITF) support masked (sparse) images where not every position in the block grid contains data. Use this method to skip empty regions when iterating over blocks.
- Parameters:
- Returns:
Trueif the block contains data,Falseotherwise.- Return type:
Example:
```python grid_rows, grid_cols = image.block_grid_size for row in range(grid_rows):
- for col in range(grid_cols):
- if image.has_block(row, col, resolution_level=0):
block = image.get_block(row, col, resolution_level=0)
- image_shape¶
Image dimensions as
(bands, rows, columns)in CHW format.
- key¶
Unique identifier for this asset within the dataset.
- media_type¶
MIME type of the asset content.
- metadata¶
Asset-level metadata as a
MetadataProvider.
- num_bands¶
Number of spectral bands.
- num_bits_per_pixel¶
Nominal bits per pixel.
- num_columns¶
Image width at full resolution in pixels.
- num_pixels_per_block_horizontal¶
Block width in pixels.
- num_pixels_per_block_vertical¶
Block height in pixels.
- num_resolution_levels¶
Number of resolution levels in the image pyramid.
- num_rows¶
Image height at full resolution in pixels.
- pad_pixel_value¶
Value used for padding incomplete edge blocks.
- pixel_value_type¶
Pixel data type.
- raw_asset¶
Raw asset bytes as a
BytesIOobject.
- roles¶
Semantic roles for this asset.
- tile_byte_ranges()¶
Return per-tile byte ranges relative to the source file.
Returns a dictionary mapping
(block_row, block_col)tuples to a list of(byte_offset, byte_length)tuples, where offsets are relative to the start of the source file. Each list contains one entry per tile-part; for most formats this is a single-element list.Returns
Nonefor providers without a backing file (e.g. in-memory images created withBufferedImageAssetProvider).
- title¶
Human-readable title for the asset.
BufferedImageAssetProvider¶
An in-memory image asset provider. Use this to create image assets for writing
via add_asset().
- class aws.osml.io.BufferedImageAssetProvider¶
Bases:
objectConstructs image assets entirely in memory.
Use
BufferedImageAssetProviderto create synthetic test data, assemble mosaics, or build images from processed results. The provider implements the same interface asImageAssetProvider, so in-memory images can be passed to any API that accepts an image asset, includingDatasetWriter.All pixel arrays use a channels-first (CHW) layout with shape
(bands, rows, cols). This matches the convention used by PyTorch and many deep learning pipelines. To convert to the channels-last (HWC) layout expected by OpenCV or Pillow, usenp.transpose(array, (1, 2, 0)). To convert from HWC back to CHW, usenp.transpose(array, (2, 0, 1)).You can populate the image all at once with
set_full_image()or set individual blocks withset_block()for large or sparse images. Optionally attach aBufferedMetadataProviderto supply encoding hints such as compression type (IC) and interleave mode (IMODE).Example:
```python import numpy as np from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
metadata = BufferedMetadataProvider() metadata.set(“IC”, “NC”) metadata.set(“IMODE”, “B”)
# Create a 512x512 RGB image with 256x256 blocks provider = BufferedImageAssetProvider.create(
key=”synthetic_image”, num_columns=512, num_rows=512, num_bands=3, block_width=256, block_height=256, pixel_type=PixelType.UInt8, metadata=metadata,
)
# Populate the full image at once image_data = np.random.randint(0, 255, (3, 512, 512), dtype=np.uint8) provider.set_full_image(image_data)
# Or set blocks individually for large/sparse images for row in range(2):
- for col in range(2):
block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8) provider.set_block(row, col, block)
- actual_bits_per_pixel¶
Actual bits per pixel.
- asset_type¶
Asset category.
- block_grid_size¶
Number of blocks in each dimension as
(rows, cols).
- block_shape¶
Block dimensions as
(bands, rows, columns)in CHW format.
- static create(key, num_columns=512, num_rows=512, num_bands=1, block_width=256, block_height=256, pixel_type=Ellipsis, num_bits_per_pixel=None, actual_bits_per_pixel=None, metadata=None, title=None, description=None)¶
Create a new in-memory image asset with the specified dimensions and pixel format.
- Parameters:
key (str) – Unique identifier for this asset.
num_columns (int, optional) – Image width in pixels.
num_rows (int, optional) – Image height in pixels.
num_bands (int, optional) – Number of spectral bands.
block_width (int, optional) – Block width in pixels.
block_height (int, optional) – Block height in pixels.
pixel_type (PixelType, optional) – Pixel data type.
actual_bits_per_pixel (int, optional) – Actual bits per pixel, may be less than the nominal size.
Noneuses the full range for the pixel type.metadata (MetadataProvider, optional) – Encoding hints such as compression type (
IC) and interleave mode (IMODE). SeeBufferedMetadataProvider.title (str, optional) – Human-readable title. Auto-generated if omitted.
description (str, optional) – Detailed description. Auto-generated if omitted.
- Returns:
A new in-memory image asset.
- Return type:
Example:
```python from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
metadata = BufferedMetadataProvider() metadata.set(“IC”, “NC”) metadata.set(“IMODE”, “B”)
- provider = BufferedImageAssetProvider.create(
key=”synthetic_image”, num_columns=512, num_rows=512, num_bands=3, pixel_type=PixelType.UInt8, metadata=metadata,
)¶
- description¶
Detailed description of the asset.
- static from_provider(provider, key=None, block_width=None, block_height=None, metadata=None)¶
Create a mutable copy of an existing
ImageAssetProvider.The returned
BufferedImageAssetProviderlazily delegatesget_block()calls to the source provider. Only blocks explicitly set viaset_block()are stored in memory; all others are read on demand from the source. This enables copy-on-write semantics without loading the entire image into memory.Because the returned provider holds a reference to the source, the source must remain open for the lifetime of the copy. If you need a fully independent snapshot, iterate over the blocks and call
set_block()for each one.- Parameters:
provider (ImageAssetProvider | BufferedImageAssetProvider) – The source image asset to delegate to. Accepts
ImageAssetProvider,BufferedImageAssetProvider, or any duck-typed object with the required image provider interface.key (str, optional) – Optional new key for the copy. If
None, the source key is reused.block_width (int, optional) – Block width for the copy. If
None, uses the source block width.block_height (int, optional) – Block height for the copy. If
None, uses the source block height.metadata (MetadataProvider, optional) – Metadata for the copy. If
None, copies the source metadata.
- Returns:
A new mutable provider backed by the source.
- Return type:
Example:
```python from aws.osml.io import IO, BufferedImageAssetProvider
- with IO.open([“input.ntf”], “r”) as reader:
source = reader.get_asset(“image:0”) copy = BufferedImageAssetProvider.from_provider(source) # Override specific blocks or metadata, then write
- get_block(block_row, block_col, resolution_level=0, bands=None)¶
Read a block of pixel data as a NumPy array.
Returns an
ndarraywith shape(bands, rows, cols)in channels-first (CHW) format. The NumPy dtype is selected automatically based on the image’spixel_value_type.- Parameters:
- Returns:
Pixel data with shape
(bands, rows, cols).- Return type:
- Raises:
IndexError – If the block coordinates are out of bounds.
ValueError – If the resolution level is invalid.
Example:
```python # Get full block with all bands block = provider.get_block(0, 0, 0) print(block.shape) # (3, 256, 256) for RGB with 256x256 blocks
# Get only the red channel (band 0) red_band = provider.get_block(0, 0, 0, bands=[0]) print(red_band.shape) # (1, 256, 256) ```
- has_block(block_row, block_col, resolution_level=0)¶
Check whether a block exists at the given grid coordinates.
- image_shape¶
Image dimensions as
(bands, rows, columns)in CHW format.
- irep¶
Image representation (MONO, RGB, MULTI, etc.).
- key¶
Unique identifier for this asset within the dataset.
- media_type¶
MIME type of the asset content.
- metadata¶
Asset-level metadata as a
MetadataProvider.
- num_bands¶
Number of spectral bands.
- num_bits_per_pixel¶
Nominal bits per pixel.
- num_columns¶
Image width at full resolution in pixels.
- num_pixels_per_block_horizontal¶
Block width in pixels.
- num_pixels_per_block_vertical¶
Block height in pixels.
- num_resolution_levels¶
Number of resolution levels in the image pyramid.
- num_rows¶
Image height at full resolution in pixels.
- pad_pixel_value¶
Value used for padding incomplete edge blocks.
- pixel_value_type¶
Pixel data type.
- raw_asset¶
Raw asset bytes as a
BytesIOobject.
- roles¶
Semantic roles for this asset.
- set_block(block_row, block_col, data)¶
Set pixel data for a single block at the given grid coordinates.
The array must use channels-first (CHW) layout with shape
(bands, block_rows, block_cols). For large or sparse images, setting blocks individually avoids loading the full image into memory.- Parameters:
block_row (int) – Row index in the block grid (0-indexed).
block_col (int) – Column index in the block grid (0-indexed).
data (numpy.ndarray) – Pixel data with shape
(bands, block_rows, block_cols). Supported dtypes:uint8,int8,uint16,int16,uint32,int32,float32,float64. The dtype should match the provider’spixel_type.
- Raises:
ValueError – If the array is not contiguous or block coordinates are out of range.
TypeError – If the array dtype is not supported.
Example:
# Set blocks individually for a 1024x1024 image with 256x256 blocks for row in range(4):
- for col in range(4):
block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8) provider.set_block(row, col, block)
- set_full_image(data)¶
Set the full image data from a NumPy array.
The array must use channels-first (CHW) layout with shape
(bands, rows, cols). The dimensions must match the values specified when the provider was created.- Parameters:
data (numpy.ndarray) – Pixel data with shape
(bands, rows, cols). Supported dtypes:uint8,int8,uint16,int16,uint32,int32,float32,float64. The dtype should match the provider’spixel_type.- Raises:
ValueError – If the array size does not match the image configuration (expected size = bands x rows x cols x bytes_per_pixel).
TypeError – If the array dtype is not supported.
Example:
# Create RGB image data in CHW format image_data = np.zeros((3, 512, 512), dtype=np.uint8) image_data[0, :, :] = 255 # Red channel provider.set_full_image(image_data) ```
- title¶
Human-readable title for the asset.
TextAssetProvider¶
Returned by get_asset() for text assets.
Provides text content and encoding information.
- class aws.osml.io.TextAssetProvider¶
Bases:
objectProvides access to text content stored within a geospatial dataset.
Geospatial datasets can embed plain text alongside imagery — mission reports, processing notes, annotations, and similar human-readable data.
TextAssetProviderexposes the decoded text through thetextproperty, along with the characterencodingandformatmetadata. UseDatasetReader.get_asset()to obtain an instance for a specific text asset in the dataset.Example:
```python from aws.osml.io import IO
- with IO.open([“image.ntf”], “r”) as dataset:
- for key in dataset.get_asset_keys(asset_type=”text”):
text_asset = dataset.get_asset(key) print(f”Encoding: {text_asset.encoding}”) print(f”Format: {text_asset.format}”) print(f”Text: {text_asset.text[:200]}…”)
- asset_type¶
The asset category.
- description¶
A detailed description of the asset.
- encoding¶
The character encoding of the text content (e.g.,
"UTF-8","ASCII").
- format¶
The text format identifier.
- key¶
The unique identifier for this asset within the dataset.
- media_type¶
The MIME type of the asset content.
- metadata¶
The asset-level
MetadataProvider.
- raw_asset¶
The raw asset bytes as a
BytesIOobject.
- roles¶
The semantic roles for this asset.
- text¶
The decoded text content as a string.
- Raises:
ValueError – If the text cannot be decoded using the asset’s character encoding.
- title¶
A human-readable title for the asset.
BufferedTextAssetProvider¶
An in-memory text asset provider. Use this to create text assets for writing
via add_asset().
- class aws.osml.io.BufferedTextAssetProvider¶
Bases:
objectConstructs text assets entirely in memory.
Use
BufferedTextAssetProviderto create text content for inclusion in a dataset — mission reports, annotations, processing notes, and similar human-readable data. The provider implements the same interface asTextAssetProvider, so in-memory text assets can be passed to any API that accepts a text asset, includingDatasetWriter.Supported character encodings are
"UTF-8","ASCII","ECS", and"MTF". You can optionally attach a title, description, and semantic roles to describe the text asset’s purpose within the dataset.Example:
```python from aws.osml.io import BufferedTextAssetProvider
# Create a UTF-8 text asset provider = BufferedTextAssetProvider.create(
key=”text_0”, text_content=”Hello, World!”, encoding=”UTF-8”,
)
# Access text content and metadata print(provider.text) # “Hello, World!” print(provider.encoding) # “UTF-8” print(provider.format) # “U8S”
# Create a text asset with title, description, and roles provider = BufferedTextAssetProvider.create(
key=”text:0”, text_content=”Mission report content…”, encoding=”UTF-8”, title=”Mission Report”, description=”Operational text”, roles=[“data”, “annotation”],
)¶
- asset_type¶
Asset category.
- static create(key, text_content, encoding='UTF-8', title=None, description=None, roles=None, metadata=None)¶
Create a new in-memory text asset with the specified content and encoding.
- Parameters:
key (str) – Unique identifier for this asset.
text_content (str) – The text content as a string.
encoding (str, optional) – Character encoding. Supported values are
"UTF-8","ASCII","ECS", and"MTF". Defaults to"UTF-8".title (str, optional) – Human-readable title for the asset.
description (str, optional) – Detailed description of the asset.
roles (list[str], optional) – Semantic roles describing the asset’s purpose. Defaults to
["data"]if not specified.metadata (MetadataProvider, optional) – Additional metadata to attach to the asset.
- Returns:
A new in-memory text asset.
- Return type:
Example:
```python from aws.osml.io import BufferedTextAssetProvider
- provider = BufferedTextAssetProvider.create(
key=”text_0”, text_content=”Hello, World!”, encoding=”UTF-8”, title=”Sample Text”, description=”A sample text asset”,
)¶
- description¶
Detailed description of the asset.
- encoding¶
The character encoding of the text content (e.g.,
"UTF-8","ASCII").
- format¶
The text format identifier (e.g.,
"U8S","STA").
- key¶
Unique identifier for this asset within the dataset.
- media_type¶
MIME type of the asset content.
- metadata¶
Asset-level metadata as a
MetadataProvider.
- raw_asset¶
Raw asset bytes as a
BytesIOobject.The raw bytes have CR/LF line delimiters as required by NITF.
- roles¶
Semantic roles for this asset.
- text¶
The decoded text content as a string.
- title¶
Human-readable title for the asset.
GraphicsAssetProvider¶
Returned by get_asset() for graphics assets.
- class aws.osml.io.GraphicsAssetProvider¶
Bases:
objectPython wrapper for GraphicsAssetProvider trait objects.
This class provides access to graphics asset properties and content. Graphics data is accessed through the
raw_assetproperty.- asset_type¶
The asset category.
- description¶
A detailed description of the asset.
- key¶
The unique identifier for this asset within the dataset.
- media_type¶
The MIME type of the asset content.
- metadata¶
The asset-level
MetadataProvider.
- raw_asset¶
The raw graphics bytes as a
BytesIOobject.Returns the complete vector graphics payload (typically CGM format) wrapped in a
BytesIOstream. Read the returned object to access the raw bytes for further processing or rendering.- Returns:
A seekable stream containing the raw graphics bytes.
- Return type:
- Raises:
IOError – If the graphics data cannot be read from the dataset.
- roles¶
The semantic roles for this asset.
- title¶
A human-readable title for the asset.
DataAssetProvider¶
Returned by get_asset() for structured data
assets. Provides XML and JSON parsing methods.
- class aws.osml.io.DataAssetProvider¶
Bases:
objectProvides access to structured data stored within a geospatial dataset.
Geospatial datasets can embed structured payloads alongside imagery — XML metadata (such as SICD/SIDD), JSON configuration, overflow TREs, and application-specific data.
DataAssetProviderexposes the raw bytes throughraw_assetand themime_typeproperty indicates the content format. UseDatasetReader.get_asset()to obtain an instance for a specific data asset in the dataset.Example:
```python import json import xml.etree.ElementTree as ET from aws.osml.io import IO
- with IO.open([“sicd_image.ntf”], “r”) as dataset:
- for key in dataset.get_asset_keys(asset_type=”data”):
data = dataset.get_asset(key) print(f”Data ‘{key}’: mime_type={data.mime_type}”)
raw = data.raw_asset.read() if data.mime_type == “application/xml”:
root = ET.fromstring(raw) print(f”XML root tag: {root.tag}”)
- elif data.mime_type == “application/json”:
obj = json.loads(raw) print(f”JSON keys: {list(obj.keys())}”)
- asset_type¶
The asset category.
- description¶
A detailed description of the asset.
- key¶
The unique identifier for this asset within the dataset.
- media_type¶
The MIME type of the asset content.
- metadata¶
The asset-level
MetadataProvider.
- mime_type¶
The MIME type of the data content (e.g.,
"application/xml","application/json").
- raw_asset¶
The raw asset bytes as a
BytesIOobject.
- roles¶
The semantic roles for this asset.
- title¶
A human-readable title for the asset.