Metadata

MetadataProvider

class aws.osml.io.MetadataProvider

Bases: object

A read-only metadata provider implementing the collections.abc.Mapping protocol.

MetadataProvider exposes metadata as a dictionary-like object. You can access individual fields with bracket notation (metadata["IC"]), iterate keys, check membership with in, and convert to a plain dict via entries() or dict(metadata).

You typically obtain a MetadataProvider from a DatasetReader or an AssetProvider rather than creating one directly.

Example:

```python from aws.osml.io import IO

with IO.open([“image.ntf”], “r”) as dataset:

meta = dataset.metadata ic = meta[“IC”] # KeyError if missing ic = meta.get(“IC”, “NC”) # default if missing all_meta = meta.entries() # full dict (single Rust call) security = meta.entries(“FS”) # prefix filter for key in meta:

print(key, meta[key])

```

entries(name=None)

Return metadata as a Python dictionary, optionally filtered by key prefix.

When name is provided, only keys that start with that prefix are included. When omitted, all metadata fields are returned. This is the fast path for bulk export (single Rust→Python crossing).

Parameters:

name (str, optional) – Key prefix used to filter the returned fields.

Returns:

Metadata fields as a dictionary.

Return type:

dict

get(key, default=None)

Retrieve the value for the given key, or a default if absent.

Parameters:
  • key (str) – The metadata field name.

  • default – Value to return if key is not present (default: None).

Returns:

The value for the key, or the default.

items()

Return a list of (key, value) tuples.

keys()

Return a list of all metadata keys.

raw

The underlying metadata in its original binary format, as a BytesIO object.

values()

Return a list of all metadata values.

BufferedMetadataProvider

class aws.osml.io.BufferedMetadataProvider(source=None)

Bases: MetadataProvider

A mutable metadata provider implementing collections.abc.MutableMapping.

BufferedMetadataProvider extends MetadataProvider with write operations, giving it full dictionary semantics. Use bracket notation to set any native Python type (str, int, float, list, dict, bool, None) and del to remove keys.

Example:

```python from aws.osml.io import BufferedMetadataProvider

metadata = BufferedMetadataProvider() metadata[“IC”] = “NC” metadata[“IMODE”] = “B” metadata[“33550”] = [0.5, 0.5, 0.0] # list metadata[“GeoProjectedCRS”] = 32618 # int

del metadata[“IC”] metadata.update({“NPPBH”: “256”, “NPPBV”: “256”}) metadata.clear() ```

clear()

Remove all key-value pairs.

update(mapping)

Bulk update from a Python dict.

TIFF Tag Dictionary Key Format

For TIFF/GeoTIFF files, metadata uses numeric tag ID strings as keys. Each key is the string representation of the TIFF tag number as defined in the TIFF 6.0 specification. For example, ImageWidth (tag 256) appears under the key "256", and Compression (tag 259) appears under "259".

This applies to all IFD-level tags, including GeoTIFF tags such as GeoKeyDirectory (tag 34735), ModelPixelScale (tag 33550), and private-use tags (32768+). GeoKey directory contents are not decoded into separate entries; the raw TIFF tags are stored as-is under their numeric keys.

Dataset-level entries that are not TIFF tags (e.g. "ByteOrder", "NumberOfDirectories") retain descriptive string keys.

from aws.osml.io import IO

with IO.open(["image.tif"], "r") as dataset:
    meta = dataset.metadata

    width = meta["256"]          # ImageWidth
    height = meta["257"]         # ImageLength
    compression = meta["259"]    # Compression
    byte_order = meta["ByteOrder"]  # dataset-level, not a tag

    # Prefix filtering works on numeric keys
    tags_starting_with_3 = dataset.metadata.entries("3")
    # Returns keys like "322" (TileWidth), "339" (SampleFormat),
    # "34735" (GeoKeyDirectory), etc.

For convenient name-based access, use the TagNameResolver helper described below.

TagNameResolver

class aws.osml.io.tiff.utils.TagNameResolver(tag_dict, custom_mapping=None)

Bases: object

Resolve TIFF tag names to numeric IDs for convenient metadata access.

Wraps a Tag_Dictionary (from MetadataProvider.entries()) and provides lookup by human-readable tag name via a configurable name-to-number mapping.

Keys that are not present in the mapping are passed through unchanged, mirroring the behaviour of __iter__() which exposes unmapped keys directly.

Example:

meta = reader.metadata.entries()
resolver = TagNameResolver(meta)
width = resolver["ImageWidth"]       # looks up key "256"
crs = resolver.by_number(34735)      # direct numeric access
comp = resolver.get("Compression")   # returns None if absent
VALUE_MAPPING: Dict[int, Dict[str, int]] = {259: {'ccittfax3': 3, 'ccittfax4': 4, 'ccittrle': 2, 'deflate': 8, 'jpeg': 7, 'lzw': 5, 'none': 1, 'ojpeg': 6, 'packbits': 32773}, 262: {'mask': 4, 'minisblack': 1, 'miniswhite': 0, 'palette': 3, 'rgb': 2, 'ycbcr': 6}, 274: {'bottomleft': 4, 'bottomright': 3, 'leftbottom': 8, 'lefttop': 5, 'rightbottom': 7, 'righttop': 6, 'topleft': 1, 'topright': 2}, 284: {'chunky': 1, 'planar': 2}, 317: {'floatingpoint': 3, 'horizontal': 2, 'none': 1}, 339: {'float': 3, 'int': 2, 'uint': 1, 'void': 4}}
DEFAULT_MAPPING: Dict[str, int] = {'Artist': 315, 'BitsPerSample': 258, 'CellLength': 265, 'CellWidth': 264, 'ColorMap': 320, 'Compression': 259, 'Copyright': 33432, 'DateTime': 306, 'DocumentName': 269, 'DotRange': 336, 'ExtraSamples': 338, 'FillOrder': 266, 'FreeByteCounts': 289, 'FreeOffsets': 288, 'GDALMetadata': 42112, 'GDALNoData': 42113, 'GeoAsciiParams': 34737, 'GeoDoubleParams': 34736, 'GeoKeyDirectory': 34735, 'GrayResponseCurve': 291, 'GrayResponseUnit': 290, 'HalftoneHints': 321, 'HostComputer': 316, 'ImageDescription': 270, 'ImageLength': 257, 'ImageWidth': 256, 'InkNames': 333, 'InkSet': 332, 'JPEGTables': 347, 'Make': 271, 'MaxSampleValue': 281, 'MinSampleValue': 280, 'Model': 272, 'ModelPixelScale': 33550, 'ModelTiepoint': 33922, 'ModelTransformation': 34264, 'NewSubfileType': 254, 'NumberOfInks': 334, 'Orientation': 274, 'PageName': 285, 'PageNumber': 297, 'PhotometricInterpretation': 262, 'PlanarConfiguration': 284, 'Predictor': 317, 'PrimaryChromaticities': 319, 'ResolutionUnit': 296, 'RowsPerStrip': 278, 'SMaxSampleValue': 341, 'SMinSampleValue': 340, 'SampleFormat': 339, 'SamplesPerPixel': 277, 'Software': 305, 'StripByteCounts': 279, 'StripOffsets': 273, 'SubIFDs': 330, 'SubfileType': 255, 'TargetPrinter': 337, 'Threshholding': 263, 'TileByteCounts': 325, 'TileLength': 323, 'TileOffsets': 324, 'TileWidth': 322, 'WhitePoint': 318, 'XResolution': 282, 'YResolution': 283}
__getitem__(name)

Look up a tag value by human-readable name.

If name is in the mapping it is resolved to the corresponding numeric key. Otherwise name is used directly as the dictionary key, allowing unmapped keys to pass through.

Raises:

KeyError – If the resolved key is not present in the underlying dictionary.

Return type:

Any

get(name, default=None)

Look up a tag value by name, returning default if not found.

Return type:

Any

by_number(tag_number)

Retrieve a tag by its numeric key directly.

Raises:

KeyError – If the tag number is not present in the dictionary.

Return type:

Any

__iter__()

Iterate over all (resolved_name, value) pairs.

Keys are resolved to human-readable tag names when a mapping exists. Tags without a known name are yielded with their numeric string key.

Return type:

Iterator[Tuple[str, Any]]

__len__()

Return the number of entries in the underlying Tag_Dictionary.

Return type:

int

__contains__(name)

Check if a tag name is present in the metadata.

Returns True when the resolved key exists in the underlying dictionary. For mapped names this checks the numeric key; for unmapped names the raw key is checked directly.

Return type:

bool

set(name, value)

Set a tag value by name.

Convenience wrapper around __setitem__.

Return type:

None

The TagNameResolver wraps a TIFF Tag_Dictionary and translates human-readable tag names to their numeric keys. It ships with a default mapping covering baseline TIFF 6.0 tags, GeoTIFF tags, and common GDAL tags.

from aws.osml.io import IO
from aws.osml.io.tiff.utils import TagNameResolver

with IO.open(["image.tif"], "r") as dataset:
    meta = dataset.metadata.entries()
    tags = TagNameResolver(meta)

    # Name-based lookup
    width = tags["ImageWidth"]        # equivalent to meta["256"]
    scale = tags["ModelPixelScale"]   # equivalent to meta["33550"]

    # Safe access with default
    nodata = tags.get("GDALNoData", "nan")

    # Direct numeric access
    raw_geokeys = tags.by_number(34735)

    # Check presence
    if "Compression" in tags:
        print(tags["Compression"])

    # Custom mapping for vendor-specific tags
    custom = TagNameResolver(meta, custom_mapping={
        "MyVendorTag": 65000,
    })
    vendor_val = custom["MyVendorTag"]