JPEG 2000 Codec

Version: 1.0
URI: https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000
Codec type: array-to-bytes

Decodes JPEG 2000 (Part 1) and HTJ2K (Part 15) codestreams into NumPy arrays. Supports both complete codestreams and single-tile codestream reconstruction from a shared main header and per-tile tile-part bytes.

This codec is format-agnostic — it decodes any valid J2K codestream regardless of whether the source data originated from NITF (IC=C8/M8/CD/MD), standalone .jp2/.j2k files, or TIFF containers.

Document Conventions

The key words “MUST”, “MUST NOT”, “SHOULD”, and “MAY” in this document are to be interpreted as described in RFC 2119.

Codec Identifier

The value of the name member in the codec metadata MUST be https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000.

Encoded Representation

The encoded representation MUST be a valid JPEG 2000 codestream conforming to ISO/IEC 15444-1 (Part 1) or ISO/IEC 15444-15 (Part 15, HTJ2K). The codestream begins with an SOC marker (0xFF4F) and ends with an EOC marker (0xFFD9).

When main_header is provided in the configuration, the encoded representation is the tile-part data only (beginning with SOT). The codec reconstructs a complete codestream by prepending the main header and appending EOC.

Rationale: Why Tile-Parts Are Not Self-Contained

JPEG 2000 codestreams support internal tiling, but the tiles are not self-contained. Each tile’s compressed data (the “tile-part”) contains only the wavelet coefficients. The decoding parameters — tile dimensions, quantization tables, wavelet decomposition levels, component counts — live in the codestream’s main header (the SIZ, COD, and QCD markers). A decoder cannot reconstruct pixels from a tile-part alone.

Additionally, JPEG 2000 supports progression orders (RLCP, RPCL) that interleave tile-parts from different tiles. Instead of writing all of tile 0’s data then all of tile 1’s data, the encoder writes resolution level 0 for every tile, then resolution level 1 for every tile, and so on. A single tile’s compressed bytes may be scattered across multiple non-contiguous locations in the file. The filesystem layer handles gathering these byte ranges — the codec receives the concatenated tile-part bytes.

This codec solves the header problem by inlining the shared main header (base64-encoded, typically 100–500 bytes) in the codec configuration. At decode time the codec reconstructs a minimal single-tile codestream on the fly:

Reconstruction of a single-tile J2K codestream from main header + tile-part bytes + EOC marker.

This approach has precedent in the JPEG 2000 ecosystem. JPIP (the JPEG 2000 Interactive Protocol, ISO/IEC 15444-9) streams individual tile-parts to clients that already hold the main header.

Configuration Parameters

Field

Type

Required

Default

Description

main_header

string or null

No

null

Base64-encoded J2K main header bytes (SOC, SIZ, COD, QCD markers). When present, the codec reconstructs a single-tile codestream by prepending the header to the chunk bytes and appending an EOC marker. When absent, the chunk MUST be a complete codestream.

resolution_level

int

No

0

Target resolution level. 0 = full resolution, N = 1/2^N resolution.

Algorithm

Decoding

  1. If main_header is present in the configuration, base64-decode it to obtain the raw header bytes.

  2. Reconstruct a minimal single-tile codestream: [main_header bytes] + [chunk bytes] + [EOC marker (0xFF 0xD9)].

  3. If no main_header is present, use the chunk bytes directly as a complete codestream.

  4. Decode the codestream at the specified resolution_level.

  5. Return an array with shape (bands, height, width) and dtype matching the codestream’s bit depth and signedness (e.g., 8-bit unsigned -> uint8, 16-bit signed -> int16).

Encoding

Encoding is not currently specified. See Implementation Notes.

Example Configuration

{
    "name": "https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000",
    "configuration": {
        "main_header": "base64:ff4f...encoded main header...",
        "resolution_level": 0
    }
}

References

Implementation Notes

aws.osml.io.zarr_codecs.Jpeg2000Codec — see API Reference.

Only the decode path is implemented. Calling encode() raises NotImplementedError.