# JPEG 2000 Codec **Version:** 1.0 **URI:** `https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000` **Codec type:** array-to-bytes Decodes JPEG 2000 (Part 1) and HTJ2K (Part 15) codestreams into NumPy arrays. Supports both complete codestreams and single-tile codestream reconstruction from a shared main header and per-tile tile-part bytes. This codec is format-agnostic — it decodes any valid J2K codestream regardless of whether the source data originated from NITF (IC=C8/M8/CD/MD), standalone `.jp2`/`.j2k` files, or TIFF containers. ## Document Conventions The key words "MUST", "MUST NOT", "SHOULD", and "MAY" in this document are to be interpreted as described in [RFC 2119][rfc2119]. ## Codec Identifier The value of the `name` member in the codec metadata MUST be `https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000`. ## Encoded Representation The encoded representation MUST be a valid JPEG 2000 codestream conforming to ISO/IEC 15444-1 (Part 1) or ISO/IEC 15444-15 (Part 15, HTJ2K). The codestream begins with an SOC marker (`0xFF4F`) and ends with an EOC marker (`0xFFD9`). When `main_header` is provided in the configuration, the encoded representation is the tile-part data only (beginning with SOT). The codec reconstructs a complete codestream by prepending the main header and appending EOC. ## Rationale: Why Tile-Parts Are Not Self-Contained JPEG 2000 codestreams support internal tiling, but the tiles are not self-contained. Each tile's compressed data (the "tile-part") contains only the wavelet coefficients. The decoding parameters — tile dimensions, quantization tables, wavelet decomposition levels, component counts — live in the codestream's main header (the SIZ, COD, and QCD markers). A decoder cannot reconstruct pixels from a tile-part alone. Additionally, JPEG 2000 supports progression orders (RLCP, RPCL) that interleave tile-parts from different tiles. Instead of writing all of tile 0's data then all of tile 1's data, the encoder writes resolution level 0 for every tile, then resolution level 1 for every tile, and so on. A single tile's compressed bytes may be scattered across multiple non-contiguous locations in the file. The filesystem layer handles gathering these byte ranges — the codec receives the concatenated tile-part bytes. This codec solves the header problem by inlining the shared main header (base64-encoded, typically 100–500 bytes) in the codec configuration. At decode time the codec reconstructs a minimal single-tile codestream on the fly: ```{image} /_static/images/reconstructed-j2k-codestream.png :alt: Reconstruction of a single-tile J2K codestream from main header + tile-part bytes + EOC marker. :width: 700px :align: center ``` This approach has precedent in the JPEG 2000 ecosystem. JPIP (the JPEG 2000 Interactive Protocol, ISO/IEC 15444-9) streams individual tile-parts to clients that already hold the main header. ## Configuration Parameters | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `main_header` | `string` or `null` | No | `null` | Base64-encoded J2K main header bytes (SOC, SIZ, COD, QCD markers). When present, the codec reconstructs a single-tile codestream by prepending the header to the chunk bytes and appending an EOC marker. When absent, the chunk MUST be a complete codestream. | | `resolution_level` | `int` | No | `0` | Target resolution level. `0` = full resolution, `N` = 1/2^N resolution. | ## Algorithm ### Decoding 1. If `main_header` is present in the configuration, base64-decode it to obtain the raw header bytes. 2. Reconstruct a minimal single-tile codestream: `[main_header bytes] + [chunk bytes] + [EOC marker (0xFF 0xD9)]`. 3. If no `main_header` is present, use the chunk bytes directly as a complete codestream. 4. Decode the codestream at the specified `resolution_level`. 5. Return an array with shape `(bands, height, width)` and dtype matching the codestream's bit depth and signedness (e.g., 8-bit unsigned -> `uint8`, 16-bit signed -> `int16`). ### Encoding Encoding is not currently specified. See [Implementation Notes](#implementation-notes). ## Example Configuration ```json { "name": "https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000", "configuration": { "main_header": "base64:ff4f...encoded main header...", "resolution_level": 0 } } ``` ## References - [ISO/IEC 15444-1:2019][iso15444-1] — JPEG 2000 image coding system: Core coding system - [ISO/IEC 15444-15:2019][iso15444-15] — JPEG 2000 image coding system: High-Throughput JPEG 2000 (HTJ2K) - [RFC 2119][rfc2119] — Key words for use in RFCs to Indicate Requirement Levels [iso15444-1]: https://www.iso.org/standard/78321.html [iso15444-15]: https://www.iso.org/standard/76621.html [rfc2119]: https://www.rfc-editor.org/rfc/rfc2119 ## Implementation Notes `aws.osml.io.zarr_codecs.Jpeg2000Codec` — see [API Reference](../api/zarr-codecs.md). Only the decode path is implemented. Calling `encode()` raises `NotImplementedError`.