Metadata¶

The Simple Path¶

For a quick look at image properties without reading any pixels, iminfo gives you dimensions, band count, pixel type, and block layout in one call:

from aws.osml.io import iminfo

info = iminfo("image.ntf")
print(f"{info.width}x{info.height}, {info.bands} bands, {info.dtype}")
print(f"Block size: {info.block_size}")
print(f"Resolution levels: {info.num_resolution_levels}")

iminfo also includes the full format-specific metadata dictionary for the image segment, so you can inspect compression, TREs, TIFF tags, and other fields without dropping down to the low-level API:

# NITF: subheader fields and TREs
info = iminfo("image.ntf")
print(info.metadata["IC"])          # "C8" (JPEG 2000)
print(info.metadata["IGEOLO"])      # 60-char geographic location string

# TREs are nested dicts
if "GEOLOB" in info.metadata:
    print(info.metadata["GEOLOB"]["ARV"])

# GeoTIFF: IFD tags keyed by numeric tag ID
info = iminfo("image.tif")
print(info.metadata["259"])         # Compression tag value
print(info.metadata["33550"])       # ModelPixelScale

The metadata dict is a snapshot — a plain Python dictionary captured when iminfo is called. It is not a live reference to the file.

When you need more control — prefix filtering, dataset-level metadata, or write-side metadata — the full MetadataProvider interface described below gives you access to everything in the file.

The MetadataProvider Interface¶

All assets and datasets expose metadata through the MetadataProvider interface, regardless of the underlying file format. MetadataProvider implements the standard Python collections.abc.Mapping protocol, so you access metadata fields the same way you access items in a dictionary:

from aws.osml.io import IO

with IO.open(["image.ntf"], "r") as dataset:
    # Dataset-level metadata — dict-style access
    dataset.metadata["FTITLE"]              # KeyError if missing
    dataset.metadata.get("FTITLE")          # None if missing
    "FTITLE" in dataset.metadata            # membership test
    len(dataset.metadata)                   # number of fields

    # Bulk export
    all_meta = dataset.metadata.entries()   # full dict (single call, fast path)
    filtered = dataset.metadata.entries("FS")  # keys starting with "FS"

    # Asset-level metadata
    image = dataset.get_asset("image:0")
    image_meta = image.metadata.entries()

When writing, use BufferedMetadataProvider to build metadata. It implements collections.abc.MutableMapping, so you write fields with dictionary syntax:

from aws.osml.io import BufferedMetadataProvider

meta = BufferedMetadataProvider()
meta["FTITLE"] = "My File Title"

The dictionary keys and value types are format-specific. A NITF file uses field names like FTITLE and ISCLAS; a GeoTIFF uses numeric tag IDs like "256" and "33550". There is no translation layer — you work directly with the native field names from whatever format you opened.

The rest of this page covers each format’s metadata conventions in detail.

NITF / NSIF Metadata¶

NITF files carry metadata in fixed-width ASCII header fields, security classification blocks, and Tagged Record Extensions (TREs). The library exposes all of these through the Mapping interface using the standard NITF field names.

Reading NITF Metadata¶

Header and Subheader Fields¶

Most NITF fields are ASCII strings — even numeric values like row counts and compression ratios. A few TREs use binary integer fields, which come through as Python int directly.

from aws.osml.io import IO

with IO.open(["image.ntf"], "r") as dataset:
    # File header fields — dict-style access
    title = dataset.metadata["FTITLE"]
    classification = dataset.metadata["FSCLAS"]       # "U", "C", "S", "TS", etc.

    # Image subheader fields
    image = dataset.get_asset("image:0")
    meta = image.metadata

    image_id = meta["IID1"]              # "IMG_00001"
    compression = meta["IC"]             # "C8"
    date_time = meta["IDATIM"]           # "20231215103045"

    # Numeric fields are ASCII strings — cast as needed
    num_rows = int(meta["NROWS"])         # 2048
    num_cols = int(meta["NCOLS"])         # 2048
    num_bands = int(meta["NBANDS"])       # 3

    # Coordinate strings — NITF packs 4 corners into a single field
    if "IGEOLO" in meta:
        geo = meta["IGEOLO"]              # 60-char geographic location string

    # Safe access for conditional fields
    comrat = meta.get("COMRAT")           # None if IC is "NC" or "NM"

TRE Fields as Nested Dictionaries¶

TRE (Tagged Record Extension) fields are grouped under their CETAG as nested dictionaries. Each TRE with a known definition in the StructureRegistry appears as a top-level key mapped to a dict of its fields:

# Access TRE fields through nested dictionaries
geolob = meta["GEOLOB"]              # dict
arv = geolob["ARV"]                  # "000360000"
brv = geolob["BRV"]                  # "000360000"

# Or access in one step
arv = meta["GEOLOB"]["ARV"]

# TREs with repeated fields contain arrays
j2klra = meta["J2KLRA"]              # dict
layers = j2klra["LAYERS"]            # list of dicts
first_layer = layers[0]              # {"LAYER_ID": "000", "BITRATE": "0.031250"}

Unknown TREs (those without a definition in the registry) appear with their raw data preserved:

# Unknown TRE — raw hex data and byte length
unknown = meta["UNKNWN"]             # {"_raw": "0102030405", "_length": 5}
raw_hex = unknown["_raw"]
byte_count = unknown["_length"]

Overflow TREs stored in data extension segments are resolved automatically — you don’t need to chase them across segments.

Repeated Fields as Arrays¶

Repeated fields in the image subheader (like band info) appear as Python lists instead of individual indexed entries:

# Band info is a list of dicts, one per band
bands = meta["BAND_INFO"]            # list of dicts
for i, band in enumerate(bands):
    print(f"Band {i}: IREPBAND={band['IREPBAND']}, NLUTS={band['NLUTS']}")

# Access a specific band directly
first_band = meta["BAND_INFO"][0]
irepband = first_band["IREPBAND"]          # "R"

Prefix Filtering¶

Use entries(prefix) to retrieve a subset of metadata. For subheader fields, the prefix matches field names. For TREs, the prefix matches the CETAG:

# Get all fields starting with "FS" (file security fields)
# Returns: FSCLAS, FSCLSY, FSCODE, FSCTLH, FSREL, FSDCTP, ...
security = dataset.metadata.entries("FS")

# Get a specific TRE by CETAG
geolob_only = image.metadata.entries("GEOLOB")
# Returns: {"GEOLOB": {"ARV": "...", "BRV": "...", ...}}

Value Types Summary¶

The Python types you get back depend on how the field is defined in the underlying structure definition:

Definition	Python type	Example
`type: str` (most fields)	`str`	`"U"`, `"00002048"`
Binary integers (`u1`, `u2`, `u4`, `u8`)	`int`	`42`
Repeated fields (band info, etc.)	`list` of `dict`	`[{"IREPBAND": "R", ...}]`
Known TREs	`dict` of `dict`	`{"GEOLOB": {"ARV": "..."}}`
Unknown TREs	`dict` with `_raw`, `_length`	`{"_raw": "0102", "_length": 2}`
Binary byte fields	`str` (hex-encoded)	`"ff8000"`

Writing NITF Metadata¶

When writing NITF files, you control header fields by setting metadata on the writer and on individual assets. The writer reads user-settable fields from the metadata provider and falls back to sensible defaults when a field is absent.

File Header Fields¶

Set file-level metadata using BufferedMetadataProvider and assign it to the writer’s metadata property:

from aws.osml.io import IO, BufferedMetadataProvider

file_meta = BufferedMetadataProvider()
file_meta["FTITLE"] = "Reconnaissance Mission 2026-03-15"
file_meta["ONAME"] = "Sensor Operator"
file_meta["OPHONE"] = "555-0100"
file_meta["FDT"] = "20260315120000"
file_meta["OSTAID"] = "STATION1"
file_meta["CLEVEL"] = "05"

# Security classification fields use the FS prefix
file_meta["FSCLAS"] = "S"
file_meta["FSCLSY"] = "US"
file_meta["FSCODE"] = "SECRET"
file_meta["FSREL"] = "USA GBR"

# FBKGC is a 3-byte binary field (RGB background color)
# Set it as a list of integers
file_meta["FBKGC"] = [255, 255, 255]

writer = IO.open(["output.ntf"], "w", "nitf")
writer.metadata = file_meta
# ... add assets and close

Fields you don’t set keep their defaults — FSCLAS defaults to "U", OSTAID defaults to "OSML_IO", CLEVEL defaults to "03", and text fields default to blank.

Image Subheader Fields¶

Image assets read several fields from metadata (IID1, IDATIM, TGTID, IID2, ISORCE). The security classification block and category fields are also metadata-driven:

image_meta = BufferedMetadataProvider()

# Identification fields
image_meta["IID1"] = "IMG_00001"
image_meta["IDATIM"] = "20260315103045"
image_meta["ISORCE"] = "Satellite XYZ"

# Security fields use the IS prefix
image_meta["ISCLAS"] = "S"
image_meta["ISCLSY"] = "US"
image_meta["ISREL"] = "USA"

# Image category and coordinate representation
image_meta["ICAT"] = "SAR"
image_meta["ICORDS"] = "G"

Fields derived from the image data itself — NROWS, NCOLS, PVTYPE, IREP, NBPP, ABPP, NBANDS, and blocking parameters — are always computed from the ImageAssetProvider and cannot be overridden through metadata.

Text, Graphic, and DES Subheader Fields¶

Text, graphic, and data extension segment subheaders follow the same pattern. Set fields on the asset’s metadata provider before adding it to the writer:

# Text asset metadata (TS prefix for security fields)
text_meta = BufferedMetadataProvider()
text_meta["TXTDT"] = "20260315120000"
text_meta["TXTFMT"] = "STA"
text_meta["TSCLAS"] = "C"

# Graphic asset metadata (SS prefix for security fields)
graphic_meta = BufferedMetadataProvider()
graphic_meta["SFMT"] = "C"
graphic_meta["SDLVL"] = "002"
graphic_meta["SLOC"] = "0050000100"
graphic_meta["SSCLAS"] = "U"

# DES metadata (DES prefix for security fields, but DECLAS for classification)
des_meta = BufferedMetadataProvider()
des_meta["DESVER"] = "02"
des_meta["DECLAS"] = "U"
des_meta["DESCLSY"] = "US"

Security Classification Fields¶

Every NITF subheader contains the same 13-field security classification block. The field names use a prefix that varies by segment type:

Segment	Prefix	Example
File header	`FS`	`FSCLAS`, `FSCLSY`, `FSCODE`, …
Image	`IS`	`ISCLAS`, `ISCLSY`, `ISCODE`, …
Text	`TS`	`TSCLAS`, `TSCLSY`, `TSCODE`, …
Graphic	`SS`	`SSCLAS`, `SSCLSY`, `SSCODE`, …
DES	`DE`/`DES`	`DECLAS`, `DESCLSY`, `DESCODE`, …

The 13 fields in each block (after the prefix) are: CLAS, CLSY, CODE, CTLH, REL, DCTP, DCDT, DCXM, DG, DGDT, CLTX, CATP, CAUT, CRSN, SRDT, CTLN.

All default to "U" for classification and blank for everything else.

Computed vs. User-Settable Fields¶

Some fields are always computed by the writer and cannot be overridden:

FHDR, FVER — determined by the output format (NITF 2.1 / NSIF 1.0)
FL, HL — computed from actual file and header lengths
NUMI, NUMS, NUMT, NUMDES, NUMRES — segment counts
Segment length arrays (LISH/LI, LSSH/LS, etc.)
ENCRYP — always "0" (unencrypted)
Image dimensions, pixel type, blocking parameters — derived from image data

Writing TREs¶

Set TREs as nested dicts using dictionary syntax, matching the format returned by the reader:

image_meta["GEOLOB"] = {
    "ARV": "000360000",
    "BRV": "000180000",
    "LSO": "-077.0000000000",
    "PSO": "+038.0000000000",
}

Numeric fields (BCS-N encoding) are auto-formatted to their defined width — short values are left-padded with zeros and overly-precise values are reformatted to fit. You can pass natural representations:

image_meta["ICHIPB"] = {
    "OP_ROW_11": "0.5",     # auto-padded to "0000000000.5" (12 bytes)
    "FI_ROW": "768",        # auto-padded to "00000768" (8 bytes)
    # ...
}

Text fields (BCS-A) are right-padded with spaces if short and rejected if too long. Values that cannot fit any field after formatting raise an error.

Encoding Tolerance¶

NITF fields declare a character encoding that constrains what bytes are valid. For example, BCS-NPI (Numeric Positive Integer) only permits digits and spaces per the JBP specification. In practice, real-world NITF producers frequently violate these constraints — the RPC00B TRE’s HEIGHT_SCALE field is commonly written with a leading + sign despite being declared BCS-NPI.

By default, the writer uses permissive validation: numeric fields (BCS-N and BCS-NPI) accept any printable ASCII character (the BCS-A range, 0x20–0x7E). This ensures that metadata read from real-world files can be written back without error — a round-trip that would otherwise fail on spec-violating values.

If you need output that is strictly compliant with the NITF encoding specifications, enable strict mode on the writer:

with IO.open("output.ntf", "w", "nitf") as writer:
    writer.strict_encoding = True  # reject values that violate declared encodings
    writer.add_asset("image:0", provider, "Image", "", ["data"])

In strict mode, writing "+0697" to a BCS-NPI field raises a validation error because + is not in the BCS-NPI character set. This is useful when producing files that must pass formal conformance checks.

Extending NITF Metadata with Structure Definitions¶

The metadata you access from NITF files is driven by a data-driven parsing framework. The library uses declarative YAML-based structure definition files (.ksy format, inspired by Kaitai Struct) to describe binary layouts. These definitions control both reading and writing — the same file that tells the parser how to extract fields from a binary header also tells the writer how to serialize them back.

This means you can extend the metadata the library understands by adding new structure definition files. If a TRE, DES, or other NITF metadata structure isn’t already supported, you can write a .ksy definition for it and register it with the StructureRegistry.

Configuring the StructureRegistry¶

The StructureRegistry manages all structure definitions. By default it loads definitions from the package’s built-in data/structures/ directory, which includes NITF file headers, image subheaders, and many common TREs. You can extend it with your own definitions:

from aws.osml.io import StructureRegistry

# Create a registry (loads built-in definitions automatically)
registry = StructureRegistry()

# See what's already available
for name in registry.list():
    print(name)
# NITF_02.10_FileHeader, NITF_02.10_ImageSubheader, TRE_GEOLOB,
# TRE_RPC00B, TRE_SENSRB, TRE_USE00A, ... (70+ definitions)

# Add a directory containing your custom .ksy files
registry.add_search_path("/path/to/my/structures")

# Retrieve a specific definition
geolob_def = registry.get("TRE_GEOLOB")

# Reload definitions after editing .ksy files on disk
registry.reload()

You can also set the OSML_IO_STRUCTURE_PATH environment variable to add search paths without changing code. Separate multiple paths with :.

export OSML_IO_STRUCTURE_PATH="/team/shared/structures:/project/custom/structures"

How Definitions Are Used¶

Structure definitions drive both directions of the pipeline:

When reading, the parser uses the definition to locate fields in the binary data, apply the correct encoding (BCS-A, BCS-N, etc.), evaluate conditional and repeated fields, and populate the metadata dictionary.
When writing, the writer uses the same definition to serialize metadata values back into the correct binary layout, validating field sizes and encodings along the way.

Adding a new .ksy file for a TRE automatically enables both reading and writing that TRE — no code changes required.

Lower-Level Access with decode / encode¶

When you have a raw binary blob for a single structure — a TRE payload, a DES, or any registered header — you can parse and serialize it directly through the StructureDefinition returned by the registry. decode turns bytes into a nested dict (repeated fields become lists, nested types become dicts), and encode turns such a dict back into bytes:

from aws.osml.io import StructureRegistry

registry = StructureRegistry()
definition = registry.get("TRE_GEOLOB")

# Parse a raw TRE payload into a dict of fields
fields = definition.decode(raw_bytes)
arv = fields["ARV"]                 # "000360000"

# Serialize a dict of fields back to bytes
raw = definition.encode({"ARV": "000360000", "BRV": "000360000"})

Values come back as the same Python types the MetadataProvider interface yields — ASCII-numeric fields as strings, binary integers as int — so cast with int(...) when you need a number. decode accepts any bytes-like input (bytes, bytearray, memoryview), and encode auto-formats numeric fields to their defined widths.

Writing Your Own Structure Definitions¶

Structure definition files use a YAML-based format with support for field types, conditional presence, repeat expressions, and nested structures. For the full syntax reference, expression language details, and examples, see the Structure Definition Guide.

TIFF and GeoTIFF Metadata¶

For TIFF and GeoTIFF files, the metadata dictionary uses numeric TIFF tag IDs as keys. Each key is the string representation of the tag number from the TIFF 6.0 specification — for example, "256" for ImageWidth, "259" for Compression, "33550" for ModelPixelScale.

This design means every tag in the IFD is preserved, including private-use tags (32768+) and vendor-specific tags that would otherwise be dropped by a hardcoded name list. The raw tag values are stored directly, with no interpretation or transformation applied.

Reading TIFF Metadata¶

from aws.osml.io import IO

with IO.open(["image.tif"], "r") as dataset:
    meta = dataset.metadata

    # Tags are keyed by their numeric ID as a string
    width = meta["256"]           # ImageWidth
    height = meta["257"]          # ImageLength
    bits = meta["258"]            # BitsPerSample
    compression = meta["259"]     # Compression

    # GeoTIFF tags use the same numeric key convention
    pixel_scale = meta["33550"]   # ModelPixelScale — e.g. [0.5, 0.5, 0.0]
    tiepoints = meta["33922"]     # ModelTiepoint
    geokeys = meta["34735"]       # GeoKeyDirectory (raw SHORT array)

    # Dataset-level entries use descriptive string keys
    byte_order = meta["ByteOrder"]              # "LittleEndian"
    num_dirs = meta["NumberOfDirectories"]       # 3

    # Prefix filtering works on the numeric key strings
    tags_3xx = dataset.metadata.entries("3")
    # Returns "322" (TileWidth), "323" (TileLength), "339" (SampleFormat),
    # "33550" (ModelPixelScale), "34735" (GeoKeyDirectory), etc.

Using TagNameResolver for Name-Based Access¶

If you prefer human-readable tag names, wrap the dictionary with TagNameResolver. It translates names like "ImageWidth" to the corresponding numeric key ("256") behind the scenes.

from aws.osml.io import IO
from aws.osml.io.tiff.utils import TagNameResolver

with IO.open(["image.tif"], "r") as dataset:
    meta = dataset.metadata.entries()
    tags = TagNameResolver(meta)

    # Look up by name — same value as meta["256"]
    width = tags["ImageWidth"]
    height = tags["ImageLength"]

    # GeoTIFF tags work the same way
    scale = tags["ModelPixelScale"]
    geokeys = tags["GeoKeyDirectory"]

    # Safe access with a default value
    nodata = tags.get("GDALNoData", "nan")

    # Direct numeric access when you know the tag number
    raw = tags.by_number(34735)

    # Check if a tag is present
    if "Compression" in tags:
        print(f"Compression: {tags['Compression']}")

    # Iterate over all entries
    for key, value in tags:
        print(f"Tag {key}: {value}")

The resolver ships with a default mapping covering baseline TIFF 6.0 tags, GeoTIFF tags, and common GDAL tags. You can extend it with custom mappings for vendor-specific or application-specific tags:

custom_tags = TagNameResolver(meta, custom_mapping={
    "MyVendorTag": 65000,
    "CloudCover": 65001,
})

vendor_val = custom_tags["MyVendorTag"]
cloud = custom_tags["CloudCover"]

# Custom mappings override defaults if there's a name collision

Writing TIFF Metadata¶

When writing TIFF files, supply metadata using the same numeric key format. The writer infers the TIFF field type from the JSON value type for common cases. For types that can’t be inferred, use an explicit type annotation.

from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType

metadata = BufferedMetadataProvider()
metadata["259"] = 8                              # Compression: Deflate
metadata["33550"] = [0.5, 0.5, 0.0]             # ModelPixelScale → DOUBLE array
metadata["42113"] = "nan"                        # GDALNoData → ASCII

# For field types that can't be inferred (e.g. UNDEFINED), use an annotation:
metadata["700"] = {"value": [60, 120, 109, 108], "type": 7}  # XMP as UNDEFINED bytes

# Attach metadata to the provider — the writer sources all IFD tags from here
provider = BufferedImageAssetProvider.create(
    key="image:0", num_columns=512, num_rows=512, num_bands=1,
    block_width=256, block_height=256, pixel_type=PixelType.UInt8,
    metadata=metadata,
)
provider.set_full_image(image_data)

with IO.open(["output.tif"], "w", "tiff") as writer:
    writer.add_asset("image:0", provider, "Image", "desc", ["data"])

Writing with TagNameResolver¶

TagNameResolver is bidirectional — you can use it to build write metadata with human-readable names instead of numeric tag IDs. Assign values with resolver["TagName"] = value and the resolver stores them under the correct numeric key in the underlying dictionary.

For tags with well-known enumerated values (Compression, Predictor, PlanarConfiguration, SampleFormat, PhotometricInterpretation, Orientation), string values are resolved to their numeric equivalents automatically:

from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
from aws.osml.io.tiff.utils import TagNameResolver

metadata = BufferedMetadataProvider()
tag_dict = metadata.entries()
resolver = TagNameResolver(tag_dict)

# Set tags by name — stored under the correct numeric key
resolver["TileWidth"] = 512
resolver["TileLength"] = 512
resolver["ModelPixelScale"] = [0.5, 0.5, 0.0]

# Enumerated values resolve automatically
resolver["Compression"] = "LZW"           # stored as 5
resolver["Compression"] = "Deflate"       # stored as 8
resolver["Predictor"] = "Horizontal"      # stored as 2
resolver["SampleFormat"] = "Float"        # stored as 3

# Integer values pass through unchanged
resolver["Compression"] = 5               # also works

# Write resolved keys back to the metadata provider
for key, value in tag_dict.items():
    metadata[key] = value

provider = BufferedImageAssetProvider.create(
    key="image:0", num_columns=512, num_rows=512, num_bands=1,
    block_width=512, block_height=512, pixel_type=PixelType.Float32,
    metadata=metadata,
)
provider.set_full_image(image_data)

with IO.open(["output.tif"], "w", "tiff") as writer:
    writer.add_asset("image:0", provider, "Image", "desc", ["data"])

The supported enumerated value names (case-insensitive) are:

Tag	Accepted names
Compression (259)	None, CCITTRLE, CCITTFax3, CCITTFax4, LZW, OJPEG, JPEG, Deflate, PackBits
PhotometricInterpretation (262)	MinIsWhite, MinIsBlack, RGB, Palette, Mask, YCbCr
PlanarConfiguration (284)	Chunky, Planar
Predictor (317)	None, Horizontal, FloatingPoint
SampleFormat (339)	UInt, Int, Float, Void
Orientation (274)	TopLeft, TopRight, BottomRight, BottomLeft, LeftTop, RightTop, RightBottom, LeftBottom