Metadata

The Simple Path

For a quick look at image properties without reading any pixels, iminfo gives you dimensions, band count, pixel type, and block layout in one call:

from aws.osml.io import iminfo

info = iminfo("image.ntf")
print(f"{info.width}x{info.height}, {info.bands} bands, {info.dtype}")
print(f"Block size: {info.block_size}")
print(f"Resolution levels: {info.num_resolution_levels}")

iminfo also includes the full format-specific metadata dictionary for the image segment, so you can inspect compression, TREs, TIFF tags, and other fields without dropping down to the low-level API:

# NITF: subheader fields and TREs
info = iminfo("image.ntf")
print(info.metadata["IC"])          # "C8" (JPEG 2000)
print(info.metadata["IGEOLO"])      # 60-char geographic location string

# TREs are nested dicts
if "GEOLOB" in info.metadata:
    print(info.metadata["GEOLOB"]["ARV"])

# GeoTIFF: IFD tags keyed by numeric tag ID
info = iminfo("image.tif")
print(info.metadata["259"])         # Compression tag value
print(info.metadata["33550"])       # ModelPixelScale

The metadata dict is a snapshot — a plain Python dictionary captured when iminfo is called. It is not a live reference to the file.

When you need more control — prefix filtering, dataset-level metadata, or write-side metadata — the full MetadataProvider interface described below gives you access to everything in the file.

The MetadataProvider Interface

All assets and datasets expose metadata through the MetadataProvider interface, regardless of the underlying file format. MetadataProvider implements the standard Python collections.abc.Mapping protocol, so you access metadata fields the same way you access items in a dictionary:

from aws.osml.io import IO

with IO.open(["image.ntf"], "r") as dataset:
    # Dataset-level metadata — dict-style access
    dataset.metadata["FTITLE"]              # KeyError if missing
    dataset.metadata.get("FTITLE")          # None if missing
    "FTITLE" in dataset.metadata            # membership test
    len(dataset.metadata)                   # number of fields

    # Bulk export
    all_meta = dataset.metadata.entries()   # full dict (single call, fast path)
    filtered = dataset.metadata.entries("FS")  # keys starting with "FS"

    # Asset-level metadata
    image = dataset.get_asset("image:0")
    image_meta = image.metadata.entries()

When writing, use BufferedMetadataProvider to build metadata. It implements collections.abc.MutableMapping, so you write fields with dictionary syntax:

from aws.osml.io import BufferedMetadataProvider

meta = BufferedMetadataProvider()
meta["FTITLE"] = "My File Title"

The dictionary keys and value types are format-specific. A NITF file uses field names like FTITLE and ISCLAS; a GeoTIFF uses numeric tag IDs like "256" and "33550". There is no translation layer — you work directly with the native field names from whatever format you opened.

The rest of this page covers each format’s metadata conventions in detail.


NITF / NSIF Metadata

NITF files carry metadata in fixed-width ASCII header fields, security classification blocks, and Tagged Record Extensions (TREs). The library exposes all of these through the Mapping interface using the standard NITF field names.

Reading NITF Metadata

Header and Subheader Fields

Most NITF fields are ASCII strings — even numeric values like row counts and compression ratios. A few TREs use binary integer fields, which come through as Python int directly.

from aws.osml.io import IO

with IO.open(["image.ntf"], "r") as dataset:
    # File header fields — dict-style access
    title = dataset.metadata["FTITLE"]
    classification = dataset.metadata["FSCLAS"]       # "U", "C", "S", "TS", etc.

    # Image subheader fields
    image = dataset.get_asset("image:0")
    meta = image.metadata

    image_id = meta["IID1"]              # "IMG_00001"
    compression = meta["IC"]             # "C8"
    date_time = meta["IDATIM"]           # "20231215103045"

    # Numeric fields are ASCII strings — cast as needed
    num_rows = int(meta["NROWS"])         # 2048
    num_cols = int(meta["NCOLS"])         # 2048
    num_bands = int(meta["NBANDS"])       # 3

    # Coordinate strings — NITF packs 4 corners into a single field
    if "IGEOLO" in meta:
        geo = meta["IGEOLO"]              # 60-char geographic location string

    # Safe access for conditional fields
    comrat = meta.get("COMRAT")           # None if IC is "NC" or "NM"

TRE Fields as Nested Dictionaries

TRE (Tagged Record Extension) fields are grouped under their CETAG as nested dictionaries. Each TRE with a known definition in the StructureRegistry appears as a top-level key mapped to a dict of its fields:

# Access TRE fields through nested dictionaries
geolob = meta["GEOLOB"]              # dict
arv = geolob["ARV"]                  # "000360000"
brv = geolob["BRV"]                  # "000360000"

# Or access in one step
arv = meta["GEOLOB"]["ARV"]

# TREs with repeated fields contain arrays
j2klra = meta["J2KLRA"]              # dict
layers = j2klra["LAYERS"]            # list of dicts
first_layer = layers[0]              # {"LAYER_ID": "000", "BITRATE": "0.031250"}

Unknown TREs (those without a definition in the registry) appear with their raw data preserved:

# Unknown TRE — raw hex data and byte length
unknown = meta["UNKNWN"]             # {"_raw": "0102030405", "_length": 5}
raw_hex = unknown["_raw"]
byte_count = unknown["_length"]

Overflow TREs stored in data extension segments are resolved automatically — you don’t need to chase them across segments.

Repeated Fields as Arrays

Repeated fields in the image subheader (like band info) appear as Python lists instead of individual indexed entries:

# Band info is a list of dicts, one per band
bands = meta["BAND_INFO"]            # list of dicts
for i, band in enumerate(bands):
    print(f"Band {i}: IREPBAND={band['IREPBAND']}, NLUTS={band['NLUTS']}")

# Access a specific band directly
first_band = meta["BAND_INFO"][0]
irepband = first_band["IREPBAND"]          # "R"

Prefix Filtering

Use entries(prefix) to retrieve a subset of metadata. For subheader fields, the prefix matches field names. For TREs, the prefix matches the CETAG:

# Get all fields starting with "FS" (file security fields)
# Returns: FSCLAS, FSCLSY, FSCODE, FSCTLH, FSREL, FSDCTP, ...
security = dataset.metadata.entries("FS")

# Get a specific TRE by CETAG
geolob_only = image.metadata.entries("GEOLOB")
# Returns: {"GEOLOB": {"ARV": "...", "BRV": "...", ...}}

Value Types Summary

The Python types you get back depend on how the field is defined in the underlying structure definition:

Definition

Python type

Example

type: str (most fields)

str

"U", "00002048"

Binary integers (u1, u2, u4, u8)

int

42

Repeated fields (band info, etc.)

list of dict

[{"IREPBAND": "R", ...}]

Known TREs

dict of dict

{"GEOLOB": {"ARV": "..."}}

Unknown TREs

dict with _raw, _length

{"_raw": "0102", "_length": 2}

Binary byte fields

str (hex-encoded)

"ff8000"

Writing NITF Metadata

When writing NITF files, you control header fields by setting metadata on the writer and on individual assets. The writer reads user-settable fields from the metadata provider and falls back to sensible defaults when a field is absent.

File Header Fields

Set file-level metadata using BufferedMetadataProvider and assign it to the writer’s metadata property:

from aws.osml.io import IO, BufferedMetadataProvider

file_meta = BufferedMetadataProvider()
file_meta["FTITLE"] = "Reconnaissance Mission 2026-03-15"
file_meta["ONAME"] = "Sensor Operator"
file_meta["OPHONE"] = "555-0100"
file_meta["FDT"] = "20260315120000"
file_meta["OSTAID"] = "STATION1"
file_meta["CLEVEL"] = "05"

# Security classification fields use the FS prefix
file_meta["FSCLAS"] = "S"
file_meta["FSCLSY"] = "US"
file_meta["FSCODE"] = "SECRET"
file_meta["FSREL"] = "USA GBR"

# FBKGC is a 3-byte binary field (RGB background color)
# Set it as a list of integers
file_meta["FBKGC"] = [255, 255, 255]

writer = IO.open(["output.ntf"], "w", "nitf")
writer.metadata = file_meta
# ... add assets and close

Fields you don’t set keep their defaults — FSCLAS defaults to "U", OSTAID defaults to "OSML_IO", CLEVEL defaults to "03", and text fields default to blank.

Image Subheader Fields

Image assets read several fields from metadata (IID1, IDATIM, TGTID, IID2, ISORCE). The security classification block and category fields are also metadata-driven:

image_meta = BufferedMetadataProvider()

# Identification fields
image_meta["IID1"] = "IMG_00001"
image_meta["IDATIM"] = "20260315103045"
image_meta["ISORCE"] = "Satellite XYZ"

# Security fields use the IS prefix
image_meta["ISCLAS"] = "S"
image_meta["ISCLSY"] = "US"
image_meta["ISREL"] = "USA"

# Image category and coordinate representation
image_meta["ICAT"] = "SAR"
image_meta["ICORDS"] = "G"

Fields derived from the image data itself — NROWS, NCOLS, PVTYPE, IREP, NBPP, ABPP, NBANDS, and blocking parameters — are always computed from the ImageAssetProvider and cannot be overridden through metadata.

Text, Graphic, and DES Subheader Fields

Text, graphic, and data extension segment subheaders follow the same pattern. Set fields on the asset’s metadata provider before adding it to the writer:

# Text asset metadata (TS prefix for security fields)
text_meta = BufferedMetadataProvider()
text_meta["TXTDT"] = "20260315120000"
text_meta["TXTFMT"] = "STA"
text_meta["TSCLAS"] = "C"

# Graphic asset metadata (SS prefix for security fields)
graphic_meta = BufferedMetadataProvider()
graphic_meta["SFMT"] = "C"
graphic_meta["SDLVL"] = "002"
graphic_meta["SLOC"] = "0050000100"
graphic_meta["SSCLAS"] = "U"

# DES metadata (DES prefix for security fields, but DECLAS for classification)
des_meta = BufferedMetadataProvider()
des_meta["DESVER"] = "02"
des_meta["DECLAS"] = "U"
des_meta["DESCLSY"] = "US"

Security Classification Fields

Every NITF subheader contains the same 13-field security classification block. The field names use a prefix that varies by segment type:

Segment

Prefix

Example

File header

FS

FSCLAS, FSCLSY, FSCODE, …

Image

IS

ISCLAS, ISCLSY, ISCODE, …

Text

TS

TSCLAS, TSCLSY, TSCODE, …

Graphic

SS

SSCLAS, SSCLSY, SSCODE, …

DES

DE/DES

DECLAS, DESCLSY, DESCODE, …

The 13 fields in each block (after the prefix) are: CLAS, CLSY, CODE, CTLH, REL, DCTP, DCDT, DCXM, DG, DGDT, CLTX, CATP, CAUT, CRSN, SRDT, CTLN.

All default to "U" for classification and blank for everything else.

Computed vs. User-Settable Fields

Some fields are always computed by the writer and cannot be overridden:

  • FHDR, FVER — determined by the output format (NITF 2.1 / NSIF 1.0)

  • FL, HL — computed from actual file and header lengths

  • NUMI, NUMS, NUMT, NUMDES, NUMRES — segment counts

  • Segment length arrays (LISH/LI, LSSH/LS, etc.)

  • ENCRYP — always "0" (unencrypted)

  • Image dimensions, pixel type, blocking parameters — derived from image data

Writing TREs

Set TREs as nested dicts using dictionary syntax, matching the format returned by the reader:

image_meta["GEOLOB"] = {
    "ARV": "000360000",
    "BRV": "000180000",
    "LSO": "-077.0000000000",
    "PSO": "+038.0000000000",
}

Numeric fields (BCS-N encoding) are auto-formatted to their defined width — short values are left-padded with zeros and overly-precise values are reformatted to fit. You can pass natural representations:

image_meta["ICHIPB"] = {
    "OP_ROW_11": "0.5",     # auto-padded to "0000000000.5" (12 bytes)
    "FI_ROW": "768",        # auto-padded to "00000768" (8 bytes)
    # ...
}

Text fields (BCS-A) are right-padded with spaces if short and rejected if too long. Values that cannot fit any field after formatting raise an error.

Encoding Tolerance

NITF fields declare a character encoding that constrains what bytes are valid. For example, BCS-NPI (Numeric Positive Integer) only permits digits and spaces per the JBP specification. In practice, real-world NITF producers frequently violate these constraints — the RPC00B TRE’s HEIGHT_SCALE field is commonly written with a leading + sign despite being declared BCS-NPI.

By default, the writer uses permissive validation: numeric fields (BCS-N and BCS-NPI) accept any printable ASCII character (the BCS-A range, 0x20–0x7E). This ensures that metadata read from real-world files can be written back without error — a round-trip that would otherwise fail on spec-violating values.

If you need output that is strictly compliant with the NITF encoding specifications, enable strict mode on the writer:

with IO.open("output.ntf", "w", "nitf") as writer:
    writer.strict_encoding = True  # reject values that violate declared encodings
    writer.add_asset("image:0", provider, "Image", "", ["data"])

In strict mode, writing "+0697" to a BCS-NPI field raises a validation error because + is not in the BCS-NPI character set. This is useful when producing files that must pass formal conformance checks.

Extending NITF Metadata with Structure Definitions

The metadata you access from NITF files is driven by a data-driven parsing framework. The library uses declarative YAML-based structure definition files (.ksy format, inspired by Kaitai Struct) to describe binary layouts. These definitions control both reading and writing — the same file that tells the parser how to extract fields from a binary header also tells the writer how to serialize them back.

This means you can extend the metadata the library understands by adding new structure definition files. If a TRE, DES, or other NITF metadata structure isn’t already supported, you can write a .ksy definition for it and register it with the StructureRegistry.

Configuring the StructureRegistry

The StructureRegistry manages all structure definitions. By default it loads definitions from the package’s built-in data/structures/ directory, which includes NITF file headers, image subheaders, and many common TREs. You can extend it with your own definitions:

from aws.osml.io import StructureRegistry

# Create a registry (loads built-in definitions automatically)
registry = StructureRegistry()

# See what's already available
for name in registry.list():
    print(name)
# NITF_02.10_FileHeader, NITF_02.10_ImageSubheader, TRE_GEOLOB,
# TRE_RPC00B, TRE_SENSRB, TRE_USE00A, ... (70+ definitions)

# Add a directory containing your custom .ksy files
registry.add_search_path("/path/to/my/structures")

# Retrieve a specific definition
geolob_def = registry.get("TRE_GEOLOB")

# Reload definitions after editing .ksy files on disk
registry.reload()

You can also set the OSML_IO_STRUCTURE_PATH environment variable to add search paths without changing code. Separate multiple paths with :.

export OSML_IO_STRUCTURE_PATH="/team/shared/structures:/project/custom/structures"

How Definitions Are Used

Structure definitions drive both directions of the pipeline:

  • When reading, the parser uses the definition to locate fields in the binary data, apply the correct encoding (BCS-A, BCS-N, etc.), evaluate conditional and repeated fields, and populate the metadata dictionary.

  • When writing, the writer uses the same definition to serialize metadata values back into the correct binary layout, validating field sizes and encodings along the way.

Adding a new .ksy file for a TRE automatically enables both reading and writing that TRE — no code changes required.

Lower-Level Access with StructureAccessor

For lower-level access with built-in type conversion, the StructureAccessor returns Value objects with as_str(), as_int(), and as_float() methods that handle NITF’s ASCII-numeric conventions (e.g. parsing "003" as 3).

Writing Your Own Structure Definitions

Structure definition files use a YAML-based format with support for field types, conditional presence, repeat expressions, and nested structures. For the full syntax reference, expression language details, and examples, see the Structure Definition Guide.


TIFF and GeoTIFF Metadata

For TIFF and GeoTIFF files, the metadata dictionary uses numeric TIFF tag IDs as keys. Each key is the string representation of the tag number from the TIFF 6.0 specification — for example, "256" for ImageWidth, "259" for Compression, "33550" for ModelPixelScale.

This design means every tag in the IFD is preserved, including private-use tags (32768+) and vendor-specific tags that would otherwise be dropped by a hardcoded name list. The raw tag values are stored directly, with no interpretation or transformation applied.

Reading TIFF Metadata

from aws.osml.io import IO

with IO.open(["image.tif"], "r") as dataset:
    meta = dataset.metadata

    # Tags are keyed by their numeric ID as a string
    width = meta["256"]           # ImageWidth
    height = meta["257"]          # ImageLength
    bits = meta["258"]            # BitsPerSample
    compression = meta["259"]     # Compression

    # GeoTIFF tags use the same numeric key convention
    pixel_scale = meta["33550"]   # ModelPixelScale — e.g. [0.5, 0.5, 0.0]
    tiepoints = meta["33922"]     # ModelTiepoint
    geokeys = meta["34735"]       # GeoKeyDirectory (raw SHORT array)

    # Dataset-level entries use descriptive string keys
    byte_order = meta["ByteOrder"]              # "LittleEndian"
    num_dirs = meta["NumberOfDirectories"]       # 3

    # Prefix filtering works on the numeric key strings
    tags_3xx = dataset.metadata.entries("3")
    # Returns "322" (TileWidth), "323" (TileLength), "339" (SampleFormat),
    # "33550" (ModelPixelScale), "34735" (GeoKeyDirectory), etc.

Using TagNameResolver for Name-Based Access

If you prefer human-readable tag names, wrap the dictionary with TagNameResolver. It translates names like "ImageWidth" to the corresponding numeric key ("256") behind the scenes.

from aws.osml.io import IO
from aws.osml.io.tiff.utils import TagNameResolver

with IO.open(["image.tif"], "r") as dataset:
    meta = dataset.metadata.entries()
    tags = TagNameResolver(meta)

    # Look up by name — same value as meta["256"]
    width = tags["ImageWidth"]
    height = tags["ImageLength"]

    # GeoTIFF tags work the same way
    scale = tags["ModelPixelScale"]
    geokeys = tags["GeoKeyDirectory"]

    # Safe access with a default value
    nodata = tags.get("GDALNoData", "nan")

    # Direct numeric access when you know the tag number
    raw = tags.by_number(34735)

    # Check if a tag is present
    if "Compression" in tags:
        print(f"Compression: {tags['Compression']}")

    # Iterate over all entries
    for key, value in tags:
        print(f"Tag {key}: {value}")

The resolver ships with a default mapping covering baseline TIFF 6.0 tags, GeoTIFF tags, and common GDAL tags. You can extend it with custom mappings for vendor-specific or application-specific tags:

custom_tags = TagNameResolver(meta, custom_mapping={
    "MyVendorTag": 65000,
    "CloudCover": 65001,
})

vendor_val = custom_tags["MyVendorTag"]
cloud = custom_tags["CloudCover"]

# Custom mappings override defaults if there's a name collision

Writing TIFF Metadata

When writing TIFF files, supply metadata using the same numeric key format. The writer infers the TIFF field type from the JSON value type for common cases. For types that can’t be inferred, use an explicit type annotation.

from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType

metadata = BufferedMetadataProvider()
metadata["259"] = 8                              # Compression: Deflate
metadata["33550"] = [0.5, 0.5, 0.0]             # ModelPixelScale → DOUBLE array
metadata["42113"] = "nan"                        # GDALNoData → ASCII

# For field types that can't be inferred (e.g. UNDEFINED), use an annotation:
metadata["700"] = {"value": [60, 120, 109, 108], "type": 7}  # XMP as UNDEFINED bytes

# Attach metadata to the provider — the writer sources all IFD tags from here
provider = BufferedImageAssetProvider.create(
    key="image:0", num_columns=512, num_rows=512, num_bands=1,
    block_width=256, block_height=256, pixel_type=PixelType.UInt8,
    metadata=metadata,
)
provider.set_full_image(image_data)

with IO.open(["output.tif"], "w", "tiff") as writer:
    writer.add_asset("image:0", provider, "Image", "desc", ["data"])

Writing with TagNameResolver

TagNameResolver is bidirectional — you can use it to build write metadata with human-readable names instead of numeric tag IDs. Assign values with resolver["TagName"] = value and the resolver stores them under the correct numeric key in the underlying dictionary.

For tags with well-known enumerated values (Compression, Predictor, PlanarConfiguration, SampleFormat, PhotometricInterpretation, Orientation), string values are resolved to their numeric equivalents automatically:

from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
from aws.osml.io.tiff.utils import TagNameResolver

metadata = BufferedMetadataProvider()
tag_dict = metadata.entries()
resolver = TagNameResolver(tag_dict)

# Set tags by name — stored under the correct numeric key
resolver["TileWidth"] = 512
resolver["TileLength"] = 512
resolver["ModelPixelScale"] = [0.5, 0.5, 0.0]

# Enumerated values resolve automatically
resolver["Compression"] = "LZW"           # stored as 5
resolver["Compression"] = "Deflate"       # stored as 8
resolver["Predictor"] = "Horizontal"      # stored as 2
resolver["SampleFormat"] = "Float"        # stored as 3

# Integer values pass through unchanged
resolver["Compression"] = 5               # also works

# Write resolved keys back to the metadata provider
for key, value in tag_dict.items():
    metadata[key] = value

provider = BufferedImageAssetProvider.create(
    key="image:0", num_columns=512, num_rows=512, num_bands=1,
    block_width=512, block_height=512, pixel_type=PixelType.Float32,
    metadata=metadata,
)
provider.set_full_image(image_data)

with IO.open(["output.tif"], "w", "tiff") as writer:
    writer.add_asset("image:0", provider, "Image", "desc", ["data"])

The supported enumerated value names (case-insensitive) are:

Tag

Accepted names

Compression (259)

None, CCITTRLE, CCITTFax3, CCITTFax4, LZW, OJPEG, JPEG, Deflate, PackBits

PhotometricInterpretation (262)

MinIsWhite, MinIsBlack, RGB, Palette, Mask, YCbCr

PlanarConfiguration (284)

Chunky, Planar

Predictor (317)

None, Horizontal, FloatingPoint

SampleFormat (339)

UInt, Int, Float, Void

Orientation (274)

TopLeft, TopRight, BottomRight, BottomLeft, LeftTop, RightTop, RightBottom, LeftBottom