Performance Benchmarks¶
This page shows benchmark results for the osml-imagery-io Python API. All benchmarks use cold-start timing — each iteration opens the dataset from scratch with no warm caches or pre-initialized state. OS page-cache effects may still influence results for repeated iterations on the same file.
Benchmark Groups¶
The benchmark suite produces five result groups:
Tile Read Native — Block reads through the native IO path using access patterns (single tile, small ROI, large ROI) that match the Zarr benchmarks for direct comparison.
Tile Read Zarr Local — Tile reads through
MultiReferenceFileSystem+zarr.open_group()with a local file as the backing store. Uses hierarchical Kerchunk indexes with the appropriate codec (JbpBlockCodec for NITF, TiffTileCodec for TIFF/COG).Tile Read Zarr S3 — Same Zarr path but with S3 as the backing store. Only included when
OSML_IO_BENCHMARK_S3_BUCKETis set.Index Generation — End-to-end time to scan a local file and produce a Kerchunk JSON tile index via
OversightMLParser+write_tile_index(). Includes multi-resolution index generation for COG and NITF R-set pyramids.Metadata — Time to open a dataset, read file-level and image asset metadata.
Dataset Coverage¶
The benchmark suite exercises multiple format and compression combinations:
NITF uncompressed (NC) — various sizes from 1MB to 64MB
NITF JPEG (C3) — lossy compressed multi-band
NITF JPEG 2000 (C8) — wavelet compressed, including large real-world imagery
NITF SIDD — SAR-derived product with XML DES metadata
TIFF uncompressed — exercises the TiffTileCodec in the Zarr path
COG pyramid — multi-resolution TIFF with overview IFDs
NITF R-set pyramid — multi-file NITF with overview levels
Generating Results¶
Benchmark datasets are configured in data/benchmark/benchmark_datasets.yaml.
Dataset paths are resolved relative to data/benchmark/ by default, or relative
to the directory specified by OSML_IO_BENCHMARK_DATA.
1. Generate synthetic datasets (optional)¶
python scripts/generate_benchmark_data.py
This creates synthetic NITF imagery in data/integration/synthetic/ and appends
entries to benchmark_datasets.yaml.
For multi-resolution and TIFF coverage, also generate:
# Synthetic TIFF (exercises TiffTileCodec in the Zarr path)
python scripts/generate_synthetic_image.py data/integration/synthetic/synth_small_tiff.tif \
--format tiff --width 1024 --height 1024 --bands 1 \
--tile-width 256 --tile-height 256 --compression none
# COG pyramid (multi-resolution TIFF with overview IFDs)
python scripts/generate_synthetic_image_pyramid.py data/integration/synthetic/synth_cog_pyramid.tif \
--mode cog --width 2048 --height 2048 --tile-width 256 --tile-height 256 --levels 3
# NITF R-set pyramid (multi-file NITF with overview levels)
python scripts/generate_synthetic_image_pyramid.py data/integration/synthetic/synth_rset_pyramid.ntf \
--mode rset --width 2048 --height 2048 --tile-width 256 --tile-height 256 --levels 3
Set OSML_IO_BENCHMARK_DATA=data/integration when running benchmarks to include
the synthetic datasets.
2. Upload benchmark data to S3 (optional — for S3 benchmarks)¶
The S3 Zarr benchmarks read tiles over the network from an S3 bucket. The bucket
must mirror the same relative paths that benchmark_datasets.yaml uses under
OSML_IO_BENCHMARK_DATA.
Prerequisites:
An S3 bucket you have read access to.
The
s3fsPython package installed (pip install s3fs).AWS credentials available via the standard boto3 credential chain (environment variables,
~/.aws/credentials, instance role, etc.).
Upload the data:
# Sync local benchmark data to S3 (preserves relative paths)
aws s3 sync data/integration/ s3://my-bucket/benchmark-data/
The OSML_IO_BENCHMARK_S3_BUCKET value must point to the same prefix:
OSML_IO_BENCHMARK_S3_BUCKET=s3://my-bucket/benchmark-data
If this variable is not set, S3 benchmarks are skipped automatically.
3. Build with release optimizations¶
Benchmarks must be run against a release build. Debug builds (the default for
maturin develop) include no optimizations and produce results 5–15x slower than
release, which makes comparisons meaningless.
maturin develop --release
4. Run the benchmarks¶
# Local benchmarks only
OSML_IO_BENCHMARK_DATA=data/integration pytest -m benchmark --benchmark-autosave
# Include S3 benchmarks
OSML_IO_BENCHMARK_DATA=data/integration \
OSML_IO_BENCHMARK_S3_BUCKET=s3://my-bucket/benchmark-data \
pytest -m benchmark --benchmark-autosave
Results are saved to .benchmarks/.
5. Generate the results fragment¶
python scripts/generate_benchmark_report.py
This reads the latest result from .benchmarks/ and writes docs/_benchmark_results.md.
You can also point it at a specific file:
python scripts/generate_benchmark_report.py .benchmarks/Linux-CPython-3.12/0001_abc.json
6. Rebuild the docs¶
make html -C docs
Comparison Axes¶
The Tile Read Native and Tile Read Zarr Local groups use the same access patterns (single tile, small ROI, large ROI) on the same datasets, so their results are directly comparable:
Native IO vs Zarr-from-local: Isolates the Zarr/fsspec/codec overhead. Compare
tile_read_nativeagainsttile_read_zarr_localfor the same dataset and access pattern.Zarr-from-local vs Zarr-from-S3: Measures the network latency impact. Compare
tile_read_zarr_localagainsttile_read_zarr_s3.
Results¶
Read Performance Comparison¶
Dataset |
Access Pattern |
Native |
Zarr Local |
Zarr S3 |
|---|---|---|---|---|
Synth COG Pyramid |
single tile |
1 |
4 |
180 |
Synth Small TIFF |
single tile |
2 |
4 |
253 |
Synth Small TIFF |
small roi |
3 |
6 |
216 |
Synth COG Pyramid |
small roi |
3 |
7 |
184 |
Tiny NITF (1MB) |
single tile |
3 |
5 |
411 |
Synth NITF R-set Pyramid |
small roi |
4 |
8 |
199 |
Synth Small NC |
small roi |
4 |
7 |
508 |
Synth Small NC |
single tile |
5 |
4 |
196 |
Synth Medium C3 |
single tile |
5 |
6 |
246 |
Synth NITF R-set Pyramid |
single tile |
5 |
21 |
207 |
Tiny NITF (1MB) |
small roi |
6 |
6 |
219 |
Synth Medium C3 |
small roi |
6 |
10 |
213 |
Synth Medium C8 |
single tile |
8 |
9 |
182 |
Synth Large NC |
small roi |
19 |
30 |
568 |
Synth Medium C8 |
small roi |
21 |
29 |
244 |
Synth Large NC |
single tile |
23 |
5 |
257 |
Umbra SIDD |
small roi |
32 |
45 |
4912 |
Umbra SIDD |
single tile |
44 |
47 |
5062 |
WV 8-band J2K (354MB) |
single tile |
186 |
50 |
683 |
WV Pan J2K (679MB) |
single tile |
198 |
33 |
284 |
WV Pan J2K (679MB) |
small roi |
295 |
125 |
677 |
WV 8-band J2K (354MB) |
small roi |
918 |
417 |
2144 |
WV Pan J2K (679MB) |
large roi |
1452 |
1147 |
3178 |
All times in milliseconds (ms).
Tile Read Native¶
Operation |
Dataset |
Access Pattern |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|---|
native_read |
Synth COG Pyramid |
single tile |
1 |
2 |
1 |
1 |
0 |
10 |
native_read |
Synth Small TIFF |
single tile |
1 |
5 |
2 |
1 |
1 |
10 |
native_read |
Synth Small TIFF |
small roi |
3 |
4 |
3 |
3 |
0 |
10 |
native_read |
Synth COG Pyramid |
small roi |
3 |
4 |
3 |
3 |
0 |
10 |
native_read |
Tiny NITF (1MB) |
single tile |
3 |
4 |
3 |
3 |
0 |
10 |
native_read |
Synth NITF R-set Pyramid |
small roi |
4 |
5 |
4 |
4 |
0 |
10 |
native_read |
Synth Small NC |
small roi |
4 |
5 |
4 |
4 |
0 |
10 |
native_read |
Synth Small NC |
single tile |
4 |
7 |
5 |
4 |
1 |
10 |
native_read |
Synth Medium C3 |
single tile |
4 |
7 |
5 |
4 |
1 |
10 |
native_read |
Synth NITF R-set Pyramid |
single tile |
4 |
11 |
5 |
4 |
2 |
10 |
native_read |
Tiny NITF (1MB) |
small roi |
4 |
19 |
6 |
4 |
5 |
10 |
native_read |
Synth Medium C3 |
small roi |
6 |
7 |
6 |
6 |
0 |
10 |
native_read |
Synth Medium C8 |
single tile |
6 |
20 |
8 |
7 |
4 |
10 |
native_read |
Synth Large NC |
small roi |
16 |
21 |
19 |
19 |
2 |
10 |
native_read |
Synth Medium C8 |
small roi |
19 |
26 |
21 |
20 |
2 |
10 |
native_read |
Synth Large NC |
single tile |
15 |
88 |
23 |
16 |
23 |
10 |
native_read |
Umbra SIDD |
small roi |
31 |
35 |
32 |
32 |
1 |
10 |
native_read |
Umbra SIDD |
single tile |
30 |
150 |
44 |
33 |
37 |
10 |
native_read |
WV 8-band J2K (354MB) |
single tile |
172 |
229 |
186 |
181 |
16 |
10 |
native_read |
WV Pan J2K (679MB) |
single tile |
190 |
209 |
198 |
198 |
6 |
10 |
native_read |
WV Pan J2K (679MB) |
small roi |
280 |
306 |
295 |
295 |
7 |
10 |
native_read |
WV 8-band J2K (354MB) |
small roi |
902 |
946 |
918 |
914 |
15 |
10 |
native_read |
WV Pan J2K (679MB) |
large roi |
1407 |
1512 |
1452 |
1445 |
37 |
10 |
All times in milliseconds (ms).
Dted Parse¶
Operation |
Dataset |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|
dted_open_and_parse |
test_bench_dted_open_and_parse |
1 |
1 |
1 |
1 |
0 |
20 |
All times in milliseconds (ms).
Dted Full Read¶
Operation |
Dataset |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|
dted_full_read |
test_bench_dted_full_read |
3 |
4 |
3 |
3 |
0 |
10 |
All times in milliseconds (ms).
Index Generation¶
Operation |
Dataset |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|
index_generation |
Synth Small TIFF |
3 |
4 |
3 |
3 |
1 |
5 |
index_generation |
Synth COG Pyramid |
4 |
6 |
5 |
5 |
1 |
5 |
index_generation |
Synth Small NC |
5 |
7 |
6 |
6 |
1 |
5 |
index_generation |
Synth Medium C8 |
5 |
7 |
7 |
7 |
1 |
5 |
index_generation |
Synth Medium C3 |
7 |
8 |
7 |
8 |
1 |
5 |
index_generation |
Synth NITF R-set Pyramid |
7 |
8 |
8 |
8 |
1 |
5 |
index_generation |
Tiny NITF (1MB) |
6 |
14 |
10 |
10 |
4 |
5 |
index_generation |
Synth Large NC |
19 |
27 |
22 |
21 |
3 |
5 |
index_generation |
Umbra SIDD |
23 |
27 |
25 |
24 |
2 |
5 |
index_generation |
WV 8-band J2K (354MB) |
92 |
106 |
97 |
95 |
6 |
5 |
index_generation |
WV Pan J2K (679MB) |
228 |
246 |
236 |
237 |
7 |
5 |
All times in milliseconds (ms).
Metadata¶
Operation |
Dataset |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|
metadata_read |
Synth Small TIFF |
1 |
1 |
1 |
1 |
0 |
10 |
metadata_read |
Synth COG Pyramid |
1 |
2 |
1 |
1 |
0 |
10 |
metadata_read |
Synth Medium C3 |
3 |
5 |
4 |
4 |
1 |
10 |
metadata_read |
Tiny NITF (1MB) |
3 |
4 |
4 |
4 |
0 |
10 |
metadata_read |
Synth NITF R-set Pyramid |
3 |
5 |
4 |
4 |
1 |
10 |
metadata_read |
Synth Small NC |
3 |
6 |
4 |
4 |
1 |
10 |
metadata_read |
Synth Medium C8 |
3 |
6 |
4 |
4 |
1 |
10 |
metadata_read |
Synth Large NC |
10 |
14 |
12 |
11 |
1 |
10 |
metadata_read |
Umbra SIDD |
13 |
28 |
16 |
15 |
4 |
10 |
metadata_read |
WV 8-band J2K (354MB) |
56 |
80 |
63 |
61 |
7 |
10 |
metadata_read |
WV Pan J2K (679MB) |
114 |
122 |
116 |
116 |
2 |
10 |
All times in milliseconds (ms).
Tile Read Zarr Local¶
Operation |
Dataset |
Access Pattern |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|---|
zarr_read |
Synth Small TIFF |
single tile |
3 |
5 |
4 |
4 |
1 |
10 |
zarr_read |
Synth COG Pyramid |
single tile |
3 |
6 |
4 |
4 |
1 |
10 |
zarr_read |
Synth Small NC |
single tile |
3 |
6 |
4 |
4 |
1 |
10 |
zarr_read |
Tiny NITF (1MB) |
single tile |
4 |
6 |
5 |
4 |
1 |
10 |
zarr_read |
Synth Large NC |
single tile |
3 |
9 |
5 |
5 |
2 |
10 |
zarr_read |
Tiny NITF (1MB) |
small roi |
4 |
8 |
6 |
6 |
1 |
10 |
zarr_read |
Synth Small TIFF |
small roi |
4 |
8 |
6 |
7 |
1 |
10 |
zarr_read |
Synth Medium C3 |
single tile |
5 |
14 |
6 |
5 |
3 |
10 |
zarr_read |
Synth COG Pyramid |
small roi |
6 |
7 |
7 |
6 |
0 |
10 |
zarr_read |
Synth Small NC |
small roi |
5 |
10 |
7 |
7 |
1 |
10 |
zarr_read |
Synth NITF R-set Pyramid |
small roi |
7 |
11 |
8 |
7 |
1 |
10 |
zarr_read |
Synth Medium C8 |
single tile |
6 |
11 |
9 |
9 |
1 |
10 |
zarr_read |
Synth Medium C3 |
small roi |
8 |
12 |
10 |
10 |
1 |
10 |
zarr_read |
Synth NITF R-set Pyramid |
single tile |
3 |
178 |
21 |
4 |
55 |
10 |
zarr_read |
Synth Medium C8 |
small roi |
26 |
33 |
29 |
28 |
2 |
10 |
zarr_read |
Synth Large NC |
small roi |
11 |
180 |
30 |
13 |
53 |
10 |
zarr_read |
WV Pan J2K (679MB) |
single tile |
21 |
119 |
33 |
23 |
30 |
10 |
zarr_read |
Umbra SIDD |
small roi |
43 |
49 |
45 |
45 |
2 |
10 |
zarr_read |
Umbra SIDD |
single tile |
43 |
55 |
47 |
46 |
3 |
10 |
zarr_read |
WV 8-band J2K (354MB) |
single tile |
48 |
57 |
50 |
49 |
2 |
10 |
zarr_read |
WV Pan J2K (679MB) |
small roi |
109 |
219 |
125 |
112 |
34 |
10 |
zarr_read |
WV 8-band J2K (354MB) |
small roi |
406 |
453 |
417 |
416 |
14 |
10 |
zarr_read |
WV Pan J2K (679MB) |
large roi |
1088 |
1191 |
1147 |
1149 |
32 |
10 |
All times in milliseconds (ms).
Tile Read Zarr S3¶
Operation |
Dataset |
Access Pattern |
Min |
Max |
Mean |
Median |
StdDev |
Rounds |
|---|---|---|---|---|---|---|---|---|
zarr_read |
Synth COG Pyramid |
single tile |
173 |
186 |
180 |
181 |
6 |
3 |
zarr_read |
Synth Medium C8 |
single tile |
154 |
225 |
182 |
167 |
38 |
3 |
zarr_read |
Synth COG Pyramid |
small roi |
170 |
200 |
184 |
184 |
15 |
3 |
zarr_read |
Synth Small NC |
single tile |
160 |
242 |
196 |
187 |
42 |
3 |
zarr_read |
Synth NITF R-set Pyramid |
small roi |
182 |
212 |
199 |
202 |
15 |
3 |
zarr_read |
Synth NITF R-set Pyramid |
single tile |
173 |
265 |
207 |
183 |
51 |
3 |
zarr_read |
Synth Medium C3 |
small roi |
195 |
231 |
213 |
212 |
18 |
3 |
zarr_read |
Synth Small TIFF |
small roi |
202 |
231 |
216 |
216 |
15 |
3 |
zarr_read |
Tiny NITF (1MB) |
small roi |
207 |
240 |
219 |
211 |
18 |
3 |
zarr_read |
Synth Medium C8 |
small roi |
180 |
348 |
244 |
206 |
91 |
3 |
zarr_read |
Synth Medium C3 |
single tile |
179 |
336 |
246 |
225 |
81 |
3 |
zarr_read |
Synth Small TIFF |
single tile |
168 |
315 |
253 |
276 |
77 |
3 |
zarr_read |
Synth Large NC |
single tile |
238 |
289 |
257 |
244 |
28 |
3 |
zarr_read |
WV Pan J2K (679MB) |
single tile |
218 |
336 |
284 |
299 |
61 |
3 |
zarr_read |
Tiny NITF (1MB) |
single tile |
222 |
772 |
411 |
239 |
313 |
3 |
zarr_read |
Synth Small NC |
small roi |
239 |
858 |
508 |
428 |
317 |
3 |
zarr_read |
Synth Large NC |
small roi |
498 |
663 |
568 |
543 |
85 |
3 |
zarr_read |
WV Pan J2K (679MB) |
small roi |
433 |
850 |
677 |
748 |
217 |
3 |
zarr_read |
WV 8-band J2K (354MB) |
single tile |
524 |
840 |
683 |
686 |
158 |
3 |
zarr_read |
WV 8-band J2K (354MB) |
small roi |
1991 |
2267 |
2144 |
2175 |
140 |
3 |
zarr_read |
WV Pan J2K (679MB) |
large roi |
2965 |
3517 |
3178 |
3053 |
297 |
3 |
zarr_read |
Umbra SIDD |
small roi |
4808 |
5095 |
4912 |
4833 |
159 |
3 |
zarr_read |
Umbra SIDD |
single tile |
4084 |
6116 |
5062 |
4986 |
1018 |
3 |
All times in milliseconds (ms).