Cloud Imagery Access via Zarr

The Challenge of Working with Large Geospatial Datasets

Geospatial imagery archives are massive. Government and commercial satellite programs produce hundreds of petabytes of imagery, growing at tens to hundreds of terabytes per day. The traditional workflow of downloading entire multi-GB files before accessing any pixels is giving way to distributed access patterns. Machine Learning inference, tile-level analytics pulling from global data lakehouses, and interactive visualizations, all benefit from services that can quickly fetch only the compressed bytes for the tiles they care about, decode them, and move on.

The challenge is that some well established geospatial formats were designed before cloud object stores existed. Their internal structure — headers, offset tables, interleaved bands, shared compression state — was built for fast local disk access, not HTTP range requests with network delays. At best consumers of these images execute multiple reads to gather header and pixel information from different parts of a file. In some cases entire files must be scanned sequentially to locate the region of interst.

Where Zarr Fits In

Zarr is a modern array storage format designed from the ground up for chunked, compressed, cloud-native data. Each chunk in a Zarr array is independently addressable and independently decodable. Zarr enjoys integration with a wide ecosystem: xarray for labeled array access, Dask for parallel and distributed computation, and fsspec for transparent cloud IO.

If we were starting from scratch, we might store our imagery as Zarr arrays and be done. The reality is we have petabytes of imagery governed by well-defined standards and strong communities already stored in the cloud S3. Converting it all to Zarr or something like cloud-optimized GeoTIFFs is expensive, slow, and will increase storage costs. What we need is a way to make the existing data behave like Zarr without actually transcoding it.

VirtualiZarr: Making Old Data Behave Like New

VirtualiZarr is a community project under the zarr-developers organization that creates virtual Zarr stores from archival data formats. Instead of converting files, it builds lightweight reference layers that describe where each chunk lives inside an existing file. These references map Zarr chunk coordinates to byte ranges in the original data, making archival files appear as native Zarr stores without copying or modifying a single byte.

VirtualiZarr provides a pluggable parser system and a ManifestStore abstraction that represents a virtual Zarr store backed by chunk references. The references can be serialized to several formats including Kerchunk JSON, Kerchunk Parquet, or committed to an Icechunk transactional store. On the consumer side, fsspec’s ReferenceFileSystem reads Kerchunk references directly, while Icechunk provides its own zarr-compatible store.

Zarr is a great general abstraction for raster data, and VirtualiZarr is the bridge that makes existing data behave like Zarr — without the cost of conversion. We keep our geospatial imagery files exactly where they are in S3. A small reference index (kilobytes to low megabytes) sits alongside each one, and the Zarr ecosystem treats them as native chunked arrays.

What We Built

Bridging archival imagery formats into the Zarr ecosystem required solving three problems that the existing tooling does not handle. Each problem led to a component:

  1. Multi-Resolution Tile Indexing for Geospatial Formats — Zarr needs to know where each tile lives in the source file. Archival formats store this information in format-specific structures (SOT markers, IFD tags, length-prefixed headers) that VirtualiZarr cannot parse out of the box.

  2. New Support for Non-contiguous Chunk Data — The Kerchunk reference spec assumes each chunk maps to a single contiguous byte range. JPEG 2000 codestreams with resolution-first progression orders (RLCP, RPCL) interleave tile-parts from different tiles, scattering a single tile’s data across multiple non-contiguous locations in the file. Neither Kerchunk nor fsspec’s ReferenceFileSystem can express or fetch this.

  3. Decoders for Geospatial Formats — The fetched bytes are still in the source format’s native encoding. NITF pixel data uses big-endian byte order and format-specific interleave modes. JPEG 2000 tile-parts are not self-contained codestreams. Compressed TIFF tiles depend on shared IFD tag metadata (predictor settings, JPEG tables) that lives outside the tile data. Standard Zarr codecs cannot decode any of these.

The following sections describe each component in pipeline order: the parser produces the tile index, the filesystem fetches the bytes, and the codec decodes them into pixels.

Figure showing overall flow of Zarr decode path.

Tile Indexing with VirtualiZarr Parsers

The library provides OversightMLParser, a single VirtualiZarr parser that makes it possible to access any imagery format supported by this library as a virtual dataset. The parser scans a file using the library’s native reader and walks each image segment, determining the byte offset and length of every tile relative to the start of the file.

How tile boundaries are calculated depends on the format and compression:

  • Uncompressed tiles have fixed sizes. Offsets are computed arithmetically from the image dimensions, pixel type, and interleave mode. Formats that support sparse tile arrays (ex: NITF masked images) may have existing metadata that contains these boundaries.

  • JPEG tiles are length-prefixed. The parser scans the length headers sequentially to locate each tile boundary.

  • JPEG 2000 codestreams use SOT (Start of Tile-part) markers that record the byte offset and length of each tile-part. If the codestream contains a TLM (Tile-part Length Marker) in its main header, the full tile index is available immediately without scanning. If no TLM is present, the parser performs a sequential SOT scan.

  • TIFF files store tile offsets and byte counts in IFD tags (TileOffsets and TileByteCounts). The parser reads these directly.

When a tile’s data spans multiple non-contiguous byte ranges (as with interleaved JPEG 2000 tile-parts), the parser detects this and emits a multi-range reference entry instead of a single-range entry. Contiguous multi-part tiles are merged into a single range automatically.

The parsers produce the ManifestStore — a virtual Zarr store backed by chunk references into the source file. This can be serialized to create a format-agnostic tile index.

MultiReferenceFileSystem: Scatter-Gather I/O for Non-Contiguous Chunks

The standard Kerchunk reference spec supports three forms per chunk key: inline data, whole-file references, and single byte-range references. This covers most formats, but breaks down for JPEG 2000 codestreams with interleaved tile-parts.

Figure showing how J2K tile parts grouped by tile are referenced by kerchunk index.

JPEG 2000 supports several progression orders that control how compressed data is organized in the codestream. Two of these — RLCP (Resolution-Layer-Component- Position) and RPCL (Resolution-Position-Component-Layer) — interleave tile-parts from different tiles. Instead of writing all of tile 0’s data, then all of tile 1’s data, the encoder writes resolution level 0 for every tile, then resolution level 1 for every tile, and so on. A single tile’s compressed bytes end up scattered across multiple non-contiguous locations in the file.

The standard ReferenceFileSystem has no way to express “fetch these six byte ranges and concatenate them” for a single chunk. This is not a theoretical edge case. Satellite imagery from several commercial providers uses RPCL progression order, and the interleaved tile-part layout is common in large multi-resolution JPEG 2000 files.

MultiReferenceFileSystem is a drop-in subclass of fsspec’s ReferenceFileSystem that extends the Kerchunk reference spec with a fourth form:

Form

Format

Description

Inline

"base64:..." or raw bytes

Inline data

Whole file

["url"]

Entire file

Single range

["url", offset, length]

One contiguous byte range

Multi-range

["url", [[offset, length], ...]]

Multiple non-contiguous byte ranges

A multi-range entry is a 2-element list where the first element is the URL and the second is a list of [offset, length] pairs. Each pair identifies one tile-part’s location in the file. The filesystem fetches all ranges and concatenates them in order before handing the bytes to the codec.

For example, a tile with six tile-parts scattered across a file:

{
  "0/data/0.0.0": [
    "s3://bucket/image.ntf",
    [[66132, 1518], [2534029, 3385], [7216065, 11460],
     [22566527, 38566], [74210429, 116812], [242293202, 339534]]
  ]
}

The URL appears once rather than being repeated for each sub-range. For a file with 1,722 tiles and six tile-parts each, this saves roughly 775 KB of redundant URL strings compared to a flat list of single-range entries.

Figure showing how J2K tile parts interleaved by resolution level are referenced by kerchunk index.

MultiReferenceFileSystem handles all standard reference types by delegating to the parent ReferenceFileSystem. When it encounters a multi-range entry, it performs scatter-gather I/O:

  • Sync path (_cat_common): fetches each byte range sequentially and concatenates the results. Adequate for local files and testing.

  • Async path (_cat_file): issues all byte-range fetches concurrently via asyncio.gather and concatenates in the original entry order. This is the path used by Zarr’s async store and is critical for minimizing latency when reading from S3.

Use MultiReferenceFileSystem instead of ReferenceFileSystem when your tile index may contain multi-range entries. It is fully backward-compatible — tile indexes with only standard single-range entries work identically. The index generator in this library automatically emits multi-range entries for tiles with interleaved tile-parts and single-range entries for everything else, so using MultiReferenceFileSystem as the default is the simplest approach.

from aws.osml.io.multi_reference_fs import MultiReferenceFileSystem

fs = MultiReferenceFileSystem(
    fo="s3://bucket/image.tile_index.json",
    asynchronous=True,
    remote_options={"asynchronous": True},
    skip_instance_cache=True,
)

The constructor accepts the same arguments as ReferenceFileSystem. Existing code that uses fsspec.filesystem("reference", ...) can switch by replacing the filesystem instantiation.

Note

This multi-range reference format is a novel extension to the Kerchunk reference spec introduced by this project. The standard Kerchunk and Zarr ecosystem does not handle the case of non-contiguous byte ranges for a single chunk. If you encounter JPEG 2000 imagery with interleaved tile-parts elsewhere, MultiReferenceFileSystem is the component that makes it work.

Custom Codecs

Once the filesystem delivers the raw bytes for a chunk, those bytes are still in the source format’s native encoding. Standard Zarr codecs cannot decode them. This library registers four Zarr v3 codecs that handle the format-specific decoding:

JbpBlockCodec handles uncompressed NITF tiles. NITF raw pixel data uses big-endian byte order and one of four interleave modes (band-interleaved by pixel, band-interleaved by line, band-interleaved by block, or band-sequential). The codec performs the endian swap and interleave conversion to produce standard NumPy arrays.

JpegCodec decodes JPEG tiles. It carries format-specific parameters (color space, interleave mode, bits per pixel) in its configuration and passes them to the underlying Rust decoder.

Jpeg2000Codec decodes JPEG 2000 tile data. JPEG 2000 codestreams support internal tiling, but the tiles are not self-contained. Each tile’s compressed data (the “tile-part”) contains only the wavelet coefficients. The decoding parameters — tile dimensions, quantization tables, wavelet decomposition levels, component counts — live in the codestream’s main header (the SIZ, COD, and QCD markers). A decoder cannot reconstruct pixels from a tile-part alone. Existing Zarr JPEG 2000 codecs assume each chunk is a complete, self-contained codestream. They cannot decode a bare tile-part without the header.

We solve this by inlining the shared main header (base64-encoded, typically 100–500 bytes) in the codec configuration stored in .zarray. At decode time the codec reconstructs a minimal single-tile codestream on the fly:

Figure showing reconstruction of a single tile J2K codestream.

OpenJPEG receives what looks like a normal single-tile codestream and decodes it. This approach has precedent in the JPEG 2000 ecosystem. JPIP (the JPEG 2000 Interactive Protocol) streams individual tile-parts to clients that already hold the main header. Because the codec operates on the J2K codestream directly, it works for standalone .j2k/.jp2 files and for J2K codestreams embedded in container formats like NITF.

Note that the codec layer performs no I/O. When MultiReferenceFileSystem fetches and concatenates multiple tile-parts for an interleaved codestream, the codec receives the complete concatenated bytes and reconstructs the codestream exactly the same way. The filesystem handles the scatter-gather complexity so codecs remain simple bytes-to-bytes transforms.

TiffTileCodec decodes compressed TIFF tiles. Individual compressed tiles extracted from a TIFF file cannot be decoded in isolation — the decoder needs IFD tag metadata (compression type, predictor, photometric interpretation, JPEG tables, etc.) that lives in the file header, not in the tile data itself. The codec stores the required IFD tag values in its configuration. At decode time it constructs a minimal single-tile TIFF in memory from the configuration and the compressed tile bytes, then hands it to libtiff via TIFFClientOpen + TIFFReadEncodedTile. This approach supports LZW, JPEG, Deflate, Adobe Deflate, and PackBits compression, including horizontal differencing predictors and YCbCr-to-RGB conversion for JPEG tiles. Uncompressed TIFF tiles (Compression=1) do not require a codec — Zarr reads the raw tile bytes directly.

Figure showing reconstruction of a single tile TIFF.

DtedTileCodec decodes DTED elevation data sections. DTED stores elevations as column-major signed-magnitude big-endian 16-bit integers with per-record headers and checksums interleaved in the data stream. The codec strips the record framing, converts signed-magnitude to native two’s complement, and transposes column-major to row-major — producing a standard (1, rows, cols) Int16 array.

The codec introduces a capability not found in existing Zarr codecs: overlap-aware edge trimming. DTED cells share boundary posts with their neighbors (the easternmost column of one cell duplicates the westernmost column of the next). The trim_* parameters discard these shared edges during decode, so the output chunks tile seamlessly without data duplication. This enables representing an entire DTED archive as a single contiguous Zarr array — each file becomes one chunk, edges are trimmed at decode time, and consumers see a seamless elevation surface with no preprocessing required.

{
    "name": "https://awslabs.github.io/osml-imagery-io/codecs/dted",
    "configuration": {
        "num_lat_points": 1201,
        "num_lon_lines": 1201,
        "record_size": 2414,
        "trim_bottom": 1,
        "trim_right": 1
    }
}

All five codecs are registered with the Zarr codec registry via Python entry points. They use URI-based names per the Zarr v3 specification to avoid conflicts with existing codecs:

  • https://awslabs.github.io/osml-imagery-io/codecs/jpeg2000

  • https://awslabs.github.io/osml-imagery-io/codecs/jpeg

  • https://awslabs.github.io/osml-imagery-io/codecs/jbp-block

  • https://awslabs.github.io/osml-imagery-io/codecs/tiff-tile

  • https://awslabs.github.io/osml-imagery-io/codecs/dted

The URIs resolve to human-readable documentation. Implementations do not fetch them at runtime.

Zarr Access to Image Pyramids

The GeoZarr multiscales convention

The GeoZarr multiscales convention defines how multi-resolution image pyramids are stored in Zarr groups. Each resolution level is identified by an asset path in the layout array. The asset field uses Zarr path nomenclature: it can be a simple name referencing a direct child group or array (e.g., "0", "level1", "full"), or a path with / separators for nested resources (e.g., "0/data"). All paths are relative to the group containing the multiscales metadata. In our implementation, each asset path points to a numbered subgroup ("0", "1", "2", …) where level "0" is the highest resolution and each subgroup contains a single array named "data". This structure is used consistently for all tile indexes — including single-resolution images, which are represented as a one-level pyramid. This means the access path root["0/data"] works the same way regardless of whether overviews are present. The root group’s .zattrs carries metadata that describes the relationship between levels.

The convention has three key parts:

Convention identity via zarr_conventions. The root group attributes include a zarr_conventions array that declares which conventions the store conforms to. Each entry has a UUID, schema URL, spec URL, name, and description. This makes convention detection unambiguous — a consumer can check for UUID d35379db-88df-4056-af3a-620245f8e347 to confirm the store uses the multiscales convention.

Layout via multiscales. The multiscales attribute is an object (not an array) containing a layout array. Each entry in the layout describes one resolution level with an asset field (the path to the Zarr subgroup), an optional derived_from field (the parent level it was downsampled from), and a transform object with scale and translation arrays.

Relative transforms. Scale factors are relative between adjacent levels, not absolute from level 0. If level 1 is derived from level 0 at 2× downsample, its transform.scale is [2.0, 2.0] (Y, X). If level 2 is also 2× from level 1, its scale is also [2.0, 2.0]. The scale and translation arrays have exactly two elements — the spatial Y and X axes. The bands axis is not included.

Note

The GeoZarr effort also defines proj: (CRS information) and spatial: (affine transforms, bounding boxes) conventions. These are not yet implemented in this library and are planned for a future phase. When added, their entries will appear in the zarr_conventions array alongside the multiscales entry.

How pyramids map to the chunk index

The tile index is always a Zarr group hierarchy with GeoZarr multiscales metadata, regardless of how many resolution levels exist. A single-resolution image is a one-level pyramid; adding overviews adds levels without changing how the base image is addressed:

image.tile_index.json
├── .zattrs              ← GeoZarr multiscales metadata (zarr_conventions, layout)
├── .zgroup
├── 0/                   ← Level 0 (full resolution)
│   └── data/
│       ├── .zarray      ← array metadata (shape, chunks, codecs)
│       ├── 0.0.0        ← ["s3://bucket/image.tif", offset, length]
│       ├── 0.0.1
│       ├── 0.1.0
│       └── 0.1.1
├── 1/                   ← Level 1 (2× downsampled)
│   └── data/
│       ├── .zarray
│       └── 0.0.0
└── 2/                   ← Level 2 (4× downsampled)
    └── data/
        ├── .zarray
        └── 0.0.0

Each level has its own .zarray with the correct shape and chunk dimensions for that resolution. Chunk keys within a level follow the standard Zarr bands.row.col convention. The path prefix (0/data/, 1/data/, 2/data/) is what distinguishes chunks at different resolution levels.

The root .zattrs carries the GeoZarr multiscales metadata that records the relationship between levels:

{
  "source": "s3://bucket/image.tif",
  "zarr_conventions": [
    {
      "uuid": "d35379db-88df-4056-af3a-620245f8e347",
      "schema_url": "https://raw.githubusercontent.com/zarr-conventions/multiscales/refs/tags/v1/schema.json",
      "spec_url": "https://github.com/zarr-conventions/multiscales/blob/v1/README.md",
      "name": "multiscales",
      "description": "Multiscale layout of zarr datasets"
    }
  ],
  "multiscales": {
    "layout": [
      {
        "asset": "0",
        "transform": {"scale": [1.0, 1.0], "translation": [0.0, 0.0]}
      },
      {
        "asset": "1",
        "derived_from": "0",
        "transform": {"scale": [2.0, 2.0], "translation": [0.0, 0.0]}
      },
      {
        "asset": "2",
        "derived_from": "1",
        "transform": {"scale": [2.0, 2.0], "translation": [0.0, 0.0]}
      }
    ],
    "resampling_method": "average"
  }
}

The zarr_conventions array identifies the store as conforming to the GeoZarr multiscales convention. The multiscales object describes the pyramid: each entry in the layout array corresponds to one resolution level. Level 0 has no derived_from — it is the full-resolution original. Levels 1 and 2 each declare their parent and a relative transform. A scale of [2.0, 2.0] means each pixel in that level covers 2×2 pixels in the derived_from level. For a power-of-2 pyramid, every non-base level has the same relative scale. The optional resampling_method records how the overviews were generated.

Consumers like xarray.open_datatree() read this metadata and present the pyramid as a DataTree with one node per level, each containing a lazily-loaded Dataset.

Format-specific sources of resolution levels

The parser produces the same hierarchical Zarr structure regardless of where the resolution levels come from. What differs is how the source format stores them.

Cloud Optimized GeoTIFF (COG): A single file contains the full-resolution image and its overviews as separate IFDs. The TIFF reader exposes these as image:0 (full resolution), image:0:overview:1 (first overview), image:0:overview:2 (second overview), etc. The parser detects the overview keys and builds the hierarchy automatically. All chunk references point to byte ranges within the same file, so a single URL suffices.

NITF R-sets: Each resolution level is a separate file. The caller passes all files to the parser, and IO.open() detects the .rN filename pattern to key the assets correctly (image:0 for the base, image:0:overview:N for each R-set file). Each level’s chunk references point to its own file, so the caller provides one URL per file.

JPEG 2000 native resolution levels: A single J2K codestream supports decoding at multiple resolution levels through wavelet decomposition. In the current implementation, the parser does not auto-expand these into pyramid levels — only explicitly provided assets (COG overview IFDs or R-set files) produce a hierarchy. Auto-expansion of J2K resolution levels is planned for a future phase, where each level would use the same codec with a different resolution_level parameter and reference only the tile-part byte ranges needed for that level.

In all cases, the parser is a metadata-only operation. It does not decode pixels, generate new overview levels, or resample imagery. It describes how to address tiles that already exist in the source data.

End-to-End Example: Single-Resolution NITF in S3

The following example shows the complete workflow for indexing a single NITF file and accessing its tiles through Zarr.

Step 1: Generate the tile index

Run this once per file, typically as part of an ingest pipeline. The file must be available locally for indexing.

Note

Index generation requires the virtualizarr optional dependency: pip install osml-imagery-io[virtualizarr]

When the url parameter is omitted, the index stores only filenames with a {{base}} template placeholder instead of absolute S3 URLs. This makes the index portable — it can be created before the upload destination is known and resolved at read time. The serialized JSON uses the Kerchunk v1 templates feature, which ReferenceFileSystem and MultiReferenceFileSystem both support natively.

from aws.osml.io.virtualizarr_parsers import OversightMLParser, write_tile_index

# Generate a portable index — no URL needed at index time
parser = OversightMLParser(local_paths="local/image.ntf")
store = parser()

# Serialize as Kerchunk JSON (or .parquet) with multi-range support
write_tile_index(store, "image.ntf.tile_index.json")

Upload both the image and the index to S3:

import boto3

s3 = boto3.client("s3")
s3.upload_file("local/image.ntf", "my-bucket", "imagery/image.ntf")
s3.upload_file(
    "image.ntf.tile_index.json",
    "my-bucket",
    "imagery/image.ntf.tile_index.json",
)

If you already know the final S3 location at index time, you can pass url directly and skip template_overrides at read time:

store = parser(url="s3://my-bucket/imagery/image.ntf")

Step 2: Open and access tiles

Codec registration happens automatically when the package is installed with the zarr extras (pip install osml-imagery-io[zarr]). No explicit import is needed.

Use MultiReferenceFileSystem to open the tile index. It handles both standard single-range entries and multi-range entries for JPEG 2000 images with interleaved tile-parts. When you slice into the dataset, the filesystem issues HTTP range requests for only the bytes backing the requested tiles and the registered codec decodes them into NumPy arrays.

For portable indexes, pass template_overrides to resolve the {{base}} placeholder to the S3 directory containing the image (with a trailing slash). The same index works unchanged if the image moves to a different bucket or path — just update the override.

import zarr
from aws.osml.io.multi_reference_fs import MultiReferenceFileSystem
from zarr.storage._fsspec import FsspecStore

fs = MultiReferenceFileSystem(
    fo="s3://my-bucket/imagery/image.ntf.tile_index.json",
    template_overrides={"base": "s3://my-bucket/imagery/"},
    asynchronous=True,
    remote_options={"asynchronous": True, "profile": "my-profile"},
    skip_instance_cache=True,
)

store = FsspecStore(fs=fs, read_only=True, path="")
root = zarr.open_group(store, mode="r", zarr_format=2)

# Read a single tile region
import numpy as np

arr = root["0/data"]
tile = np.asarray(arr[0:3, 768:1024, 1024:1280])
print(tile.shape)  # (3, 256, 256)
print(tile.dtype)  # uint8

AWS credentials can also be provided through environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_PROFILE) or any other method supported by boto3 and fsspec.

End-to-End Example: Multi-Resolution COG Pyramid

This example shows how to index a Cloud Optimized GeoTIFF that contains embedded overview images and access tiles at different resolution levels through Zarr.

A COG stores its overview images as additional IFDs within the same file. The parser detects these automatically and produces a hierarchical tile index with one subgroup per resolution level.

Step 1: Generate the hierarchical tile index

Portable indexes work with multi-file pyramids too. Each file gets its own {{base}}filename reference, and a single template_overrides resolves all of them at read time (assuming the files are co-located).

from aws.osml.io.virtualizarr_parsers import OversightMLParser, write_tile_index

# A COG with embedded overviews — single file, portable index
parser = OversightMLParser(local_paths="local/image.tif")
store = parser()
write_tile_index(store, "image.tif.tile_index.json")

For a COG with two overview levels, the resulting JSON contains subgroups "0" (full resolution), "1" (first overview), and "2" (second overview). All chunk references point to byte ranges within the same file.

The same workflow works for multi-file NITF R-set pyramids. Pass multiple paths and the index will contain {{base}}image.ntf, {{base}}image.ntf.r1, etc.:

# Multi-file NITF pyramid (R-set convention)
parser = OversightMLParser(local_paths=[
    "local/image.ntf",
    "local/image.ntf.r1",
    "local/image.ntf.r2",
])
store = parser()
write_tile_index(store, "image.ntf.tile_index.json")

Step 2: Open the pyramid and read tiles at different levels

import zarr
from aws.osml.io.multi_reference_fs import MultiReferenceFileSystem
from zarr.storage._fsspec import FsspecStore

fs = MultiReferenceFileSystem(
    fo="s3://my-bucket/imagery/image.tif.tile_index.json",
    template_overrides={"base": "s3://my-bucket/imagery/"},
    asynchronous=True,
    remote_options={"asynchronous": True},
    skip_instance_cache=True,
)

store = FsspecStore(fs=fs, read_only=True, path="")
root = zarr.open_group(store, mode="r", zarr_format=2)

# The root group contains numbered subgroups — one per resolution level
import numpy as np

# Read a tile from the full-resolution level
level_0 = root["0/data"]
print(level_0.shape)       # e.g. (3, 8192, 8192)
tile_full = np.asarray(level_0[0:3, 0:256, 0:256])

# Read the same spatial region from the first overview (2× downsampled)
level_1 = root["1/data"]
print(level_1.shape)       # e.g. (3, 4096, 4096)
tile_ovr1 = np.asarray(level_1[0:3, 0:128, 0:128])

# Read from the second overview (4× downsampled)
level_2 = root["2/data"]
print(level_2.shape)       # e.g. (3, 2048, 2048)
tile_ovr2 = np.asarray(level_2[0:3, 0:64, 0:64])

You can also open the hierarchical index as an xarray.DataTree to get a structured view of all levels:

import xarray as xr

dt = xr.open_datatree("image.tif.tile_index.json", engine="kerchunk")
print(dt)
# DataTree('None', parent=None)
# ├── DataTree('0')
# │   └── Dataset {'data': (bands: 3, y: 8192, x: 8192)}
# ├── DataTree('1')
# │   └── Dataset {'data': (bands: 3, y: 4096, x: 4096)}
# └── DataTree('2')
#     └── Dataset {'data': (bands: 3, y: 2048, x: 2048)}

# Access a specific level's dataset
level_1_ds = dt["1"].ds
print(level_1_ds["data"].shape)  # (3, 4096, 4096)

For single-file inputs without overviews, the parser produces a one-level pyramid with a single subgroup "0". The access path root["0/data"] is the same regardless of whether overviews are present, so adding overviews later does not change how existing code addresses the full-resolution image.