Working with Pixels

The Simple Path

For most pixel workflows, the convenience functions give you NumPy arrays directly without thinking about blocks or assets:

from aws.osml.io import imread, tiles

# Full image as a CHW NumPy array
pixels = imread("image.ntf")
print(pixels.shape)  # (3, 1024, 1024) — (bands, height, width)
print(pixels.dtype)  # uint8

# Process a large image in tiles without loading it all into memory
for tile in tiles("large_image.tif", tile_size=(256, 256)):
    process(tile.data)  # tile.data is a CHW NumPy array

The sections below cover the details of how pixel data is represented, how to convert between channel orderings for different libraries, and how to work with in-memory image buffers for writing.

Image Data Arrays

Block data is returned as a NumPy ndarray. NumPy is the standard array interface shared by machine learning frameworks (PyTorch, TensorFlow), computer vision libraries (OpenCV, Pillow), and the broader scientific Python ecosystem. Returning pixel data as an ndarray means you can pass blocks directly into these tools without an intermediate copy or conversion step.

For pixels wider than 8 bits (e.g. 16-bit or 32-bit imagery), the library automatically converts from the format’s stored byte order to the native byte order of your platform. NITF files store multi-byte values in big-endian order; on a little-endian machine the bytes are swapped during decode so the resulting ndarray is ready to use without manual conversion. The NumPy dtype is selected automatically based on the image’s pixel type — an 8-bit unsigned image produces a uint8 array, a 16-bit signed image produces int16, a 32-bit float produces float32, and so on.

Each array has shape (bands, rows, cols) — a channels-first (CHW) layout. This matches the convention used by PyTorch and many deep learning pipelines, where a batch of images is shaped (N, C, H, W). Channels-first ordering is also convenient for remote sensing workflows where analysis steps typically operate on a subset of spectral bands.

Other libraries expect different channel orderings:

Library

Format

Shape

osml-imagery-io

Channels First (CHW)

(bands, rows, cols)

PyTorch

Channels First (NCHW)

(batch, channels, height, width)

OpenCV

Channels Last (HWC)

(rows, cols, channels)

Pillow

Channels Last (HWC)

(height, width, channels)

Use np.transpose to reshape a block for channels-last libraries:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from aws.osml.io import IO

with IO.open(["image.ntf"], "r") as dataset:
    image = dataset.get_asset("image:0")
    block_chw = image.get_block(0, 0, resolution_level=0)

    # Convert to channels-last for display
    block_hwc = np.transpose(block_chw, (1, 2, 0))

    # Display with matplotlib
    plt.imshow(block_hwc)
    plt.title("Block (0, 0)")
    plt.show()

    # Or convert to a PIL Image for further manipulation
    pil_image = Image.fromarray(block_hwc)

Creating an Image from Scratch

BufferedImageAssetProvider and BufferedMetadataProvider let you build images and their associated metadata entirely in memory. BufferedMetadataProvider is a mutable key-value store for encoding hints and format fields — things like compression type (IC) and interleave mode (IMODE). BufferedImageAssetProvider holds the pixel data and implements the same ImageAssetProvider interface used by file-backed images, so in-memory images can be passed to any API that accepts an image asset, including the writer. This is useful for creating synthetic test data, assembling mosaics, or building images from processed results.

You can populate the image all at once with set_full_image():

from aws.osml.io import BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
import numpy as np

metadata = BufferedMetadataProvider()
metadata["IC"] = "NC"
metadata["IMODE"] = "B"

image_data = np.random.randint(0, 255, (3, 512, 512), dtype=np.uint8)

provider = BufferedImageAssetProvider.create(
    key="synthetic_image",
    num_columns=512,
    num_rows=512,
    num_bands=3,
    block_width=256,
    block_height=256,
    pixel_type=PixelType.UInt8,
    metadata=metadata,
)
provider.set_full_image(image_data)

For large images or sparse data, set blocks individually with set_block() instead of loading the full image into memory:

provider = BufferedImageAssetProvider.create(
    key="tiled_image",
    num_columns=1024,
    num_rows=1024,
    num_bands=3,
    block_width=256,
    block_height=256,
    pixel_type=PixelType.UInt8,
    metadata=metadata,
)

for row in range(4):
    for col in range(4):
        block = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8)
        provider.set_block(row, col, block)

JPEG Color Space Handling in TIFF

TIFF files compressed with JPEG (compression code 7) store RGB pixel data internally as YCbCr. This is a requirement of the JPEG-in-TIFF specification (TIFF Technical Note 2) — libtiff sets PhotometricInterpretation to YCbCr (6) for images with 3 or more bands and performs the RGB-to-YCbCr conversion automatically during encoding.

On the read side, libtiff converts YCbCr back to RGB during decoding. The pixel data returned by get_block() is always RGB, and the PhotometricInterpretation reported in metadata reflects the decoded color space (RGB), not the on-disk storage format. Callers never need to handle YCbCr data directly.

On the write side, callers provide standard RGB pixel data and select JPEG compression by setting encoding hint "259" to 7. libtiff handles the RGB-to-YCbCr conversion as part of JPEG encoding. JPEG quality is configurable via encoding hint "65537" (values 1–100, default 75).

For single-band (grayscale) images, JPEG compression uses PhotometricInterpretation MinIsBlack (1) and no color space conversion occurs.

The YCbCr conversion is an internal codec detail, similar to JPEG quantization. It is lossy — writing and reading back a JPEG-compressed TIFF will not produce an exact pixel match. Use PSNR or similar metrics to evaluate fidelity rather than exact comparison.

from aws.osml.io import IO, BufferedImageAssetProvider, BufferedMetadataProvider, PixelType
import numpy as np

# Write a JPEG-compressed TIFF
metadata = BufferedMetadataProvider()
metadata["259"] = 7               # JPEG compression
metadata["65537"] = 85            # Quality 85

image_data = np.random.randint(0, 255, (3, 256, 256), dtype=np.uint8)
provider = BufferedImageAssetProvider.create(
    key="rgb_image",
    num_columns=256,
    num_rows=256,
    num_bands=3,
    block_width=256,
    block_height=256,
    pixel_type=PixelType.UInt8,
    metadata=metadata,
)
provider.set_full_image(image_data)

with IO.open(["output.tif"], "w", "tiff") as writer:
    writer.add_asset(provider)

# Read it back — pixels are RGB, not YCbCr
with IO.open(["output.tif"], "r") as reader:
    image = reader.get_asset("image:0")
    block = image.get_block(0, 0, resolution_level=0)
    print(block.dtype)   # uint8
    print(block.shape)   # (3, 256, 256) — RGB channels

Indexed (Palette Color) Images

Some image formats store pixel values as indices into a color lookup table rather than direct color values. Both TIFF and NITF support this concept, though the mechanisms differ. In all cases, ImageAssetProvider returns the raw index values as stored in the file — it does not apply lookup tables automatically.

The library does not perform palette expansion because many workflows need the raw indices. Classification maps and thematic rasters use each index to represent a land cover class or category, not a display color. Applying the lookup table to produce RGB pixels is a separate processing step.

TIFF Palette Color

In TIFF files, palette color is indicated by PhotometricInterpretation = 3. Each pixel is a single-byte index and the actual RGB colors are defined in a separate ColorMap tag. A palette-color TIFF will report 1 band of uint8 data, and get_block() returns the index array — not the expanded RGB pixels.

from aws.osml.io import IO

with IO.open(["indexed_image.tif"], "r") as dataset:
    image = dataset.get_asset("image:0")
    print(image.num_bands)          # 1
    print(image.pixel_value_type)   # PixelType.UInt8

    block = image.get_block(0, 0, resolution_level=0)
    print(block.shape)              # (1, rows, cols) — index values, not RGB

NITF Lookup Tables (LUTs)

NITF files support per-band lookup tables through the image subheader fields NLUTSn, NELUTn, and LUTDnm (JBP §5.13.2.28–5.13.2.30). The most common case is IREP=RGB/LUT: a single-band image where each pixel is an index and three LUTs (red, green, blue) define the color mapping. Individual bands in IREP=MONO or IREP=MULTI images can also carry LUTs when IREPBANDn=LU.

LUTs are only valid for uncompressed images (IC=NC or NM) with integer or binary pixel types (PVTYPE=INT or B). For compressed formats like JPEG (IC=C3) and JPEG 2000 (IC=C8), color handling is internal to the codec — the decoder outputs final pixel values directly and NLUTSn is always 0. Vector Quantization (IC=C4) uses its own codebook-based color lookup mechanism defined in MIL-STD-188-199, which is separate from the subheader LUT fields.

As with TIFF, get_block() returns the raw stored values. For an IREP=RGB/LUT image, this means 1 band of index values. The LUT data itself is accessible through the image segment’s metadata — the subheader fields NLUTSn, NELUTn, and LUTDnm are parsed and available for applications that need to perform the lookup.