Metadata¶
The Simple Path¶
For a quick look at image properties without reading any pixels, iminfo gives
you dimensions, band count, pixel type, and block layout in one call:
from aws.osml.io import iminfo
info = iminfo("image.ntf")
print(f"{info.width}x{info.height}, {info.bands} bands, {info.dtype}")
print(f"Block size: {info.block_size}")
print(f"Resolution levels: {info.num_resolution_levels}")
iminfo also includes the full format-specific metadata dictionary for the
image segment, so you can inspect compression, TREs, TIFF tags, and other
fields without dropping down to the low-level API:
# NITF: subheader fields and TREs
info = iminfo("image.ntf")
print(info.metadata["IC"]) # "C8" (JPEG 2000)
print(info.metadata["IGEOLO"]) # 60-char geographic location string
# TREs are nested dicts
if "GEOLOB" in info.metadata:
print(info.metadata["GEOLOB"]["ARV"])
# GeoTIFF: IFD tags keyed by numeric tag ID
info = iminfo("image.tif")
print(info.metadata["259"]) # Compression tag value
print(info.metadata["33550"]) # ModelPixelScale
The metadata dict is a snapshot — a plain Python dictionary captured when
iminfo is called. It is not a live reference to the file.
When you need more control — prefix filtering, dataset-level metadata, or
write-side metadata — the full MetadataProvider interface described below
gives you access to everything in the file.
The MetadataProvider Interface¶
All assets and datasets expose metadata through the MetadataProvider interface,
regardless of the underlying file format. MetadataProvider implements the standard
Python collections.abc.Mapping protocol, so you access metadata fields the same
way you access items in a dictionary:
from aws.osml.io import IO
with IO.open(["image.ntf"], "r") as dataset:
# Dataset-level metadata — dict-style access
dataset.metadata["FTITLE"] # KeyError if missing
dataset.metadata.get("FTITLE") # None if missing
"FTITLE" in dataset.metadata # membership test
len(dataset.metadata) # number of fields
# Bulk export
all_meta = dataset.metadata.entries() # full dict (single call, fast path)
filtered = dataset.metadata.entries("FS") # keys starting with "FS"
# Asset-level metadata
image = dataset.get_asset("image:0")
image_meta = image.metadata.entries()
When writing, use BufferedMetadataProvider to build metadata. It implements
collections.abc.MutableMapping, so you write fields with dictionary syntax:
from aws.osml.io import BufferedMetadataProvider
meta = BufferedMetadataProvider()
meta["FTITLE"] = "My File Title"
The dictionary keys and value types are format-specific. A NITF file uses
field names like FTITLE and ISCLAS; a GeoTIFF uses numeric tag IDs like
"256" and "33550". There is no translation layer — you work directly with
the native field names from whatever format you opened.
The rest of this page covers each format’s metadata conventions in detail.
NITF / NSIF Metadata¶
NITF files carry metadata in fixed-width ASCII header fields, security classification blocks, and Tagged Record Extensions (TREs). The library exposes all of these through the Mapping interface using the standard NITF field names.
Reading NITF Metadata¶
Header and Subheader Fields¶
Most NITF fields are ASCII strings — even numeric values like row counts and
compression ratios. A few TREs use binary integer fields, which come through
as Python int directly.
from aws.osml.io import IO
with IO.open(["image.ntf"], "r") as dataset:
# File header fields — dict-style access
title = dataset.metadata["FTITLE"]
classification = dataset.metadata["FSCLAS"] # "U", "C", "S", "TS", etc.
# Image subheader fields
image = dataset.get_asset("image:0")
meta = image.metadata
image_id = meta["IID1"] # "IMG_00001"
compression = meta["IC"] # "C8"
date_time = meta["IDATIM"] # "20231215103045"
# Numeric fields are ASCII strings — cast as needed
num_rows = int(meta["NROWS"]) # 2048
num_cols = int(meta["NCOLS"]) # 2048
num_bands = int(meta["NBANDS"]) # 3
# Coordinate strings — NITF packs 4 corners into a single field
if "IGEOLO" in meta:
geo = meta["IGEOLO"] # 60-char geographic location string
# Safe access for conditional fields
comrat = meta.get("COMRAT") # None if IC is "NC" or "NM"
TRE Fields as Nested Dictionaries¶
TRE (Tagged Record Extension) fields are grouped under their CETAG as nested
dictionaries. Each TRE with a known definition in the StructureRegistry
appears as a top-level key mapped to a dict of its fields:
# Access TRE fields through nested dictionaries
geolob = meta["GEOLOB"] # dict
arv = geolob["ARV"] # "000360000"
brv = geolob["BRV"] # "000360000"
# Or access in one step
arv = meta["GEOLOB"]["ARV"]
# TREs with repeated fields contain arrays
j2klra = meta["J2KLRA"] # dict
layers = j2klra["LAYERS"] # list of dicts
first_layer = layers[0] # {"LAYER_ID": "000", "BITRATE": "0.031250"}
Unknown TREs (those without a definition in the registry) appear with their raw data preserved:
# Unknown TRE — raw hex data and byte length
unknown = meta["UNKNWN"] # {"_raw": "0102030405", "_length": 5}
raw_hex = unknown["_raw"]
byte_count = unknown["_length"]
Overflow TREs stored in data extension segments are resolved automatically — you don’t need to chase them across segments.
Repeated Fields as Arrays¶
Repeated fields in the image subheader (like band info) appear as Python lists instead of individual indexed entries:
# Band info is a list of dicts, one per band
bands = meta["BAND_INFO"] # list of dicts
for i, band in enumerate(bands):
print(f"Band {i}: IREPBAND={band['IREPBAND']}, NLUTS={band['NLUTS']}")
# Access a specific band directly
first_band = meta["BAND_INFO"][0]
irepband = first_band["IREPBAND"] # "R"
Prefix Filtering¶
Use entries(prefix) to retrieve a subset of metadata. For subheader fields,
the prefix matches field names. For TREs, the prefix matches the CETAG:
# Get all fields starting with "FS" (file security fields)
# Returns: FSCLAS, FSCLSY, FSCODE, FSCTLH, FSREL, FSDCTP, ...
security = dataset.metadata.entries("FS")
# Get a specific TRE by CETAG
geolob_only = image.metadata.entries("GEOLOB")
# Returns: {"GEOLOB": {"ARV": "...", "BRV": "...", ...}}
Value Types Summary¶
The Python types you get back depend on how the field is defined in the underlying structure definition:
Definition |
Python type |
Example |
|---|---|---|
|
|
|
Binary integers ( |
|
|
Repeated fields (band info, etc.) |
|
|
Known TREs |
|
|
Unknown TREs |
|
|
Binary byte fields |
|
|
Writing NITF Metadata¶
When writing NITF files, you control header fields by setting metadata on the writer and on individual assets. The writer reads user-settable fields from the metadata provider and falls back to sensible defaults when a field is absent.
File Header Fields¶
Set file-level metadata using BufferedMetadataProvider and assign it to the
writer’s metadata property:
from aws.osml.io import IO, BufferedMetadataProvider
file_meta = BufferedMetadataProvider()
file_meta["FTITLE"] = "Reconnaissance Mission 2026-03-15"
file_meta["ONAME"] = "Sensor Operator"
file_meta["OPHONE"] = "555-0100"
file_meta["FDT"] = "20260315120000"
file_meta["OSTAID"] = "STATION1"
file_meta["CLEVEL"] = "05"
# Security classification fields use the FS prefix
file_meta["FSCLAS"] = "S"
file_meta["FSCLSY"] = "US"
file_meta["FSCODE"] = "SECRET"
file_meta["FSREL"] = "USA GBR"
# FBKGC is a 3-byte binary field (RGB background color)
# Set it as a list of integers
file_meta["FBKGC"] = [255, 255, 255]
writer = IO.open(["output.ntf"], "w", "nitf")
writer.metadata = file_meta
# ... add assets and close
Fields you don’t set keep their defaults — FSCLAS defaults to "U",
OSTAID defaults to "OSML_IO", CLEVEL defaults to "03", and text
fields default to blank.
Image Subheader Fields¶
Image assets read several fields from metadata (IID1, IDATIM, TGTID,
IID2, ISORCE). The security classification block and category fields are
also metadata-driven:
image_meta = BufferedMetadataProvider()
# Identification fields
image_meta["IID1"] = "IMG_00001"
image_meta["IDATIM"] = "20260315103045"
image_meta["ISORCE"] = "Satellite XYZ"
# Security fields use the IS prefix
image_meta["ISCLAS"] = "S"
image_meta["ISCLSY"] = "US"
image_meta["ISREL"] = "USA"
# Image category and coordinate representation
image_meta["ICAT"] = "SAR"
image_meta["ICORDS"] = "G"
Fields derived from the image data itself — NROWS, NCOLS, PVTYPE,
IREP, NBPP, ABPP, NBANDS, and blocking parameters — are always
computed from the ImageAssetProvider and cannot be overridden through
metadata.
Text, Graphic, and DES Subheader Fields¶
Text, graphic, and data extension segment subheaders follow the same pattern. Set fields on the asset’s metadata provider before adding it to the writer:
# Text asset metadata (TS prefix for security fields)
text_meta = BufferedMetadataProvider()
text_meta["TXTDT"] = "20260315120000"
text_meta["TXTFMT"] = "STA"
text_meta["TSCLAS"] = "C"
# Graphic asset metadata (SS prefix for security fields)
graphic_meta = BufferedMetadataProvider()
graphic_meta["SFMT"] = "C"
graphic_meta["SDLVL"] = "002"
graphic_meta["SLOC"] = "0050000100"
graphic_meta["SSCLAS"] = "U"
# DES metadata (DES prefix for security fields, but DECLAS for classification)
des_meta = BufferedMetadataProvider()
des_meta["DESVER"] = "02"
des_meta["DECLAS"] = "U"
des_meta["DESCLSY"] = "US"
Security Classification Fields¶
Every NITF subheader contains the same 13-field security classification block. The field names use a prefix that varies by segment type:
Segment |
Prefix |
Example |
|---|---|---|
File header |
|
|
Image |
|
|
Text |
|
|
Graphic |
|
|
DES |
|
|
The 13 fields in each block (after the prefix) are: CLAS, CLSY, CODE,
CTLH, REL, DCTP, DCDT, DCXM, DG, DGDT, CLTX, CATP,
CAUT, CRSN, SRDT, CTLN.
All default to "U" for classification and blank for everything else.
Computed vs. User-Settable Fields¶
Some fields are always computed by the writer and cannot be overridden:
FHDR,FVER— determined by the output format (NITF 2.1 / NSIF 1.0)FL,HL— computed from actual file and header lengthsNUMI,NUMS,NUMT,NUMDES,NUMRES— segment countsSegment length arrays (
LISH/LI,LSSH/LS, etc.)ENCRYP— always"0"(unencrypted)Image dimensions, pixel type, blocking parameters — derived from image data
Writing TREs¶
Set TREs as nested dicts using dictionary syntax, matching the format returned by the reader:
image_meta["GEOLOB"] = {
"ARV": "000360000",
"BRV": "000180000",
"LSO": "-077.0000000000",
"PSO": "+038.0000000000",
}
Numeric fields (BCS-N encoding) are auto-formatted to their defined width — short values are left-padded with zeros and overly-precise values are reformatted to fit. You can pass natural representations:
image_meta["ICHIPB"] = {
"OP_ROW_11": "0.5", # auto-padded to "0000000000.5" (12 bytes)
"FI_ROW": "768", # auto-padded to "00000768" (8 bytes)
# ...
}
Text fields (BCS-A) are right-padded with spaces if short and rejected if too long. Values that cannot fit any field after formatting raise an error.
Encoding Tolerance¶
NITF fields declare a character encoding that constrains what bytes are valid.
For example, BCS-NPI (Numeric Positive Integer) only permits digits and spaces
per the JBP specification. In practice, real-world NITF producers frequently
violate these constraints — the RPC00B TRE’s HEIGHT_SCALE field is commonly
written with a leading + sign despite being declared BCS-NPI.
By default, the writer uses permissive validation: numeric fields (BCS-N and BCS-NPI) accept any printable ASCII character (the BCS-A range, 0x20–0x7E). This ensures that metadata read from real-world files can be written back without error — a round-trip that would otherwise fail on spec-violating values.
If you need output that is strictly compliant with the NITF encoding specifications, enable strict mode on the writer:
with IO.open("output.ntf", "w", "nitf") as writer:
writer.strict_encoding = True # reject values that violate declared encodings
writer.add_asset("image:0", provider, "Image", "", ["data"])
In strict mode, writing "+0697" to a BCS-NPI field raises a validation error
because + is not in the BCS-NPI character set. This is useful when producing
files that must pass formal conformance checks.
Extending NITF Metadata with Structure Definitions¶
The metadata you access from NITF files is driven by a
data-driven parsing framework. The library uses declarative YAML-based
structure definition files (.ksy format, inspired by
Kaitai Struct) to describe binary layouts. These
definitions control both reading and writing — the same file that tells the
parser how to extract fields from a binary header also tells the writer how
to serialize them back.
This means you can extend the metadata the library understands by adding new
structure definition files. If a TRE, DES, or other NITF metadata structure
isn’t already supported, you can write a .ksy definition for it and register
it with the StructureRegistry.
Configuring the StructureRegistry¶
The StructureRegistry manages all structure definitions. By default it loads
definitions from the package’s built-in data/structures/ directory, which
includes NITF file headers, image subheaders, and many common TREs. You can
extend it with your own definitions:
from aws.osml.io import StructureRegistry
# Create a registry (loads built-in definitions automatically)
registry = StructureRegistry()
# See what's already available
for name in registry.list():
print(name)
# NITF_02.10_FileHeader, NITF_02.10_ImageSubheader, TRE_GEOLOB,
# TRE_RPC00B, TRE_SENSRB, TRE_USE00A, ... (70+ definitions)
# Add a directory containing your custom .ksy files
registry.add_search_path("/path/to/my/structures")
# Retrieve a specific definition
geolob_def = registry.get("TRE_GEOLOB")
# Reload definitions after editing .ksy files on disk
registry.reload()
You can also set the OSML_IO_STRUCTURE_PATH environment variable to add
search paths without changing code. Separate multiple paths with :.
export OSML_IO_STRUCTURE_PATH="/team/shared/structures:/project/custom/structures"
How Definitions Are Used¶
Structure definitions drive both directions of the pipeline:
When reading, the parser uses the definition to locate fields in the binary data, apply the correct encoding (BCS-A, BCS-N, etc.), evaluate conditional and repeated fields, and populate the metadata dictionary.
When writing, the writer uses the same definition to serialize metadata values back into the correct binary layout, validating field sizes and encodings along the way.
Adding a new .ksy file for a TRE automatically enables both reading and
writing that TRE — no code changes required.
Lower-Level Access with StructureAccessor¶
For lower-level access with built-in type conversion, the StructureAccessor
returns Value objects with as_str(), as_int(), and as_float() methods
that handle NITF’s ASCII-numeric conventions (e.g. parsing "003" as 3).
Writing Your Own Structure Definitions¶
Structure definition files use a YAML-based format with support for field types, conditional presence, repeat expressions, and nested structures. For the full syntax reference, expression language details, and examples, see the Structure Definition Guide.
TIFF and GeoTIFF Metadata¶
For TIFF and GeoTIFF files, the metadata dictionary uses numeric TIFF tag IDs
as keys. Each key is the string representation of the tag number from the
TIFF 6.0 specification — for example, "256" for ImageWidth, "259" for
Compression, "33550" for ModelPixelScale.
This design means every tag in the IFD is preserved, including private-use tags (32768+) and vendor-specific tags that would otherwise be dropped by a hardcoded name list. The raw tag values are stored directly, with no interpretation or transformation applied.
Reading TIFF Metadata¶
from aws.osml.io import IO
with IO.open(["image.tif"], "r") as dataset:
meta = dataset.metadata
# Tags are keyed by their numeric ID as a string
width = meta["256"] # ImageWidth
height = meta["257"] # ImageLength
bits = meta["258"] # BitsPerSample
compression = meta["259"] # Compression
# GeoTIFF tags use the same numeric key convention
pixel_scale = meta["33550"] # ModelPixelScale — e.g. [0.5, 0.5, 0.0]
tiepoints = meta["33922"] # ModelTiepoint
geokeys = meta["34735"] # GeoKeyDirectory (raw SHORT array)
# Dataset-level entries use descriptive string keys
byte_order = meta["ByteOrder"] # "LittleEndian"
num_dirs = meta["NumberOfDirectories"] # 3
# Prefix filtering works on the numeric key strings
tags_3xx = dataset.metadata.entries("3")
# Returns "322" (TileWidth), "323" (TileLength), "339" (SampleFormat),
# "33550" (ModelPixelScale), "34735" (GeoKeyDirectory), etc.
Using TagNameResolver for Name-Based Access¶
If you prefer human-readable tag names, wrap the dictionary with
TagNameResolver. It translates names like "ImageWidth" to the
corresponding numeric key ("256") behind the scenes.
from aws.osml.io import IO
from aws.osml.io.tiff.utils import TagNameResolver
with IO.open(["image.tif"], "r") as dataset:
meta = dataset.metadata.entries()
tags = TagNameResolver(meta)
# Look up by name — same value as meta["256"]
width = tags["ImageWidth"]
height = tags["ImageLength"]
# GeoTIFF tags work the same way
scale = tags["ModelPixelScale"]
geokeys = tags["GeoKeyDirectory"]
# Safe access with a default value
nodata = tags.get("GDALNoData", "nan")
# Direct numeric access when you know the tag number
raw = tags.by_number(34735)
# Check if a tag is present
if "Compression" in tags:
print(f"Compression: {tags['Compression']}")
# Iterate over all entries
for key, value in tags:
print(f"Tag {key}: {value}")
The resolver ships with a default mapping covering baseline TIFF 6.0 tags, GeoTIFF tags, and common GDAL tags. You can extend it with custom mappings for vendor-specific or application-specific tags:
custom_tags = TagNameResolver(meta, custom_mapping={
"MyVendorTag": 65000,
"CloudCover": 65001,
})
vendor_val = custom_tags["MyVendorTag"]
cloud = custom_tags["CloudCover"]
# Custom mappings override defaults if there's a name collision
Writing TIFF Metadata¶
When writing TIFF files, supply metadata using the same numeric key format. The writer infers the TIFF field type from the JSON value type for common cases. For types that can’t be inferred, use an explicit type annotation.
from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
metadata = BufferedMetadataProvider()
metadata["259"] = 8 # Compression: Deflate
metadata["33550"] = [0.5, 0.5, 0.0] # ModelPixelScale → DOUBLE array
metadata["42113"] = "nan" # GDALNoData → ASCII
# For field types that can't be inferred (e.g. UNDEFINED), use an annotation:
metadata["700"] = {"value": [60, 120, 109, 108], "type": 7} # XMP as UNDEFINED bytes
# Attach metadata to the provider — the writer sources all IFD tags from here
provider = BufferedImageAssetProvider.create(
key="image:0", num_columns=512, num_rows=512, num_bands=1,
block_width=256, block_height=256, pixel_type=PixelType.UInt8,
metadata=metadata,
)
provider.set_full_image(image_data)
with IO.open(["output.tif"], "w", "tiff") as writer:
writer.add_asset("image:0", provider, "Image", "desc", ["data"])
Writing with TagNameResolver¶
TagNameResolver is bidirectional — you can use it to build write metadata
with human-readable names instead of numeric tag IDs. Assign values with
resolver["TagName"] = value and the resolver stores them under the correct
numeric key in the underlying dictionary.
For tags with well-known enumerated values (Compression, Predictor, PlanarConfiguration, SampleFormat, PhotometricInterpretation, Orientation), string values are resolved to their numeric equivalents automatically:
from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
from aws.osml.io.tiff.utils import TagNameResolver
metadata = BufferedMetadataProvider()
tag_dict = metadata.entries()
resolver = TagNameResolver(tag_dict)
# Set tags by name — stored under the correct numeric key
resolver["TileWidth"] = 512
resolver["TileLength"] = 512
resolver["ModelPixelScale"] = [0.5, 0.5, 0.0]
# Enumerated values resolve automatically
resolver["Compression"] = "LZW" # stored as 5
resolver["Compression"] = "Deflate" # stored as 8
resolver["Predictor"] = "Horizontal" # stored as 2
resolver["SampleFormat"] = "Float" # stored as 3
# Integer values pass through unchanged
resolver["Compression"] = 5 # also works
# Write resolved keys back to the metadata provider
for key, value in tag_dict.items():
metadata[key] = value
provider = BufferedImageAssetProvider.create(
key="image:0", num_columns=512, num_rows=512, num_bands=1,
block_width=512, block_height=512, pixel_type=PixelType.Float32,
metadata=metadata,
)
provider.set_full_image(image_data)
with IO.open(["output.tif"], "w", "tiff") as writer:
writer.add_asset("image:0", provider, "Image", "desc", ["data"])
The supported enumerated value names (case-insensitive) are:
Tag |
Accepted names |
|---|---|
Compression (259) |
None, CCITTRLE, CCITTFax3, CCITTFax4, LZW, OJPEG, JPEG, Deflate, PackBits |
PhotometricInterpretation (262) |
MinIsWhite, MinIsBlack, RGB, Palette, Mask, YCbCr |
PlanarConfiguration (284) |
Chunky, Planar |
Predictor (317) |
None, Horizontal, FloatingPoint |
SampleFormat (339) |
UInt, Int, Float, Void |
Orientation (274) |
TopLeft, TopRight, BottomRight, BottomLeft, LeftTop, RightTop, RightBottom, LeftBottom |