From 0b474095aeef48f118eb4852046494ca2b6214c3 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 2 Feb 2026 13:23:33 +0100 Subject: [PATCH 01/10] first draft --- README.md | 15 +- USERGUIDE.md | 823 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 837 insertions(+), 1 deletion(-) create mode 100644 USERGUIDE.md diff --git a/README.md b/README.md index 5357548..ac51265 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,20 @@ This repository contains a Java implementation of Zarr version 2 and 3. -## Usage +## Documentation + +For comprehensive documentation, see the [**User Guide**](USERGUIDE.md), which includes: + +- Installation instructions +- Quick start examples +- Core concepts and API reference +- Working with arrays and groups +- Storage backends (Filesystem, HTTP, S3, ZIP, Memory) +- Compression and codecs +- Advanced topics and best practices +- Troubleshooting + +## Quick Usage Example ```java import dev.zarr.zarrjava.store.FilesystemStore; diff --git a/USERGUIDE.md b/USERGUIDE.md new file mode 100644 index 0000000..c91b8a3 --- /dev/null +++ b/USERGUIDE.md @@ -0,0 +1,823 @@ +# zarr-java User Guide +[![Maven Central](https://img.shields.io/maven-central/v/dev.zarr/zarr-java.svg)](https://search.maven.org/artifact/dev.zarr/zarr-java) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +## Table of Contents +1. [Introduction](#introduction) +2. [Installation](#installation) +3. [Quick Start](#quick-start) +4. [Core Concepts](#core-concepts) +5. [Working with Arrays](#working-with-arrays) +6. [Working with Groups](#working-with-groups) +7. [Storage Backends](#storage-backends) +8. [Compression and Codecs](#compression-and-codecs) +9. [Advanced Topics](#advanced-topics) +10. [API Reference](#api-reference) +11. [Examples](#examples) +12. [Troubleshooting](#troubleshooting) +--- +## Introduction +zarr-java is a Java implementation of the [Zarr specification](https://zarr.dev/) for chunked, compressed, N-dimensional arrays. It supports both Zarr version 2 and version 3 formats, providing a unified API for working with large scientific datasets. +### Key Features +- **Full Zarr v2 and v3 support**: Read and write arrays in both formats +- **Multiple storage backends**: Filesystem, HTTP, S3, ZIP, and in-memory storage +- **Compression codecs**: Blosc, Gzip, Zstd, and more +- **Sharding support**: Efficient storage for many small chunks (v3) +- **Parallel I/O**: Optional parallel reading and writing for performance +- **Type-safe API**: Strong typing with covariant return types +--- +## Installation +### Maven +Add the following dependency to your `pom.xml`: +```xml + + dev.zarr + zarr-java + 0.0.10 + +``` +### Gradle +Add the following to your `build.gradle`: +```gradle +dependencies { + implementation 'dev.zarr:zarr-java:0.0.10' +} +``` +### Requirements +- Java 8 or higher +- Maven 3.6+ (for building from source) +--- +## Quick Start +### Reading an Existing Array +```java +import dev.zarr.zarrjava.v3.Array; +// Open an array (auto-detects version) +Array array = Array.open("/path/to/zarr/array"); +// Read the entire array +ucar.ma2.Array data = array.read(); +// Read a subset +ucar.ma2.Array subset = array.read( + new long[]{0, 0, 0}, // offset + new int[]{10, 100, 100} // shape +); +``` +### Creating and Writing an Array +```java +import dev.zarr.zarrjava.v3.Array; +import dev.zarr.zarrjava.v3.DataType; +import dev.zarr.zarrjava.store.FilesystemStore; +// Create a new array +Array array = Array.create( + new FilesystemStore("/path/to/zarr").resolve("myarray"), + Array.metadataBuilder() + .withShape(1000, 1000, 1000) + .withDataType(DataType.FLOAT32) + .withChunkShape(100, 100, 100) + .withFillValue(0.0f) + .withCodecs(c -> c.withBlosc()) + .build() +); +// Create and write data +ucar.ma2.Array data = ucar.ma2.Array.factory( + ucar.ma2.DataType.FLOAT, + new int[]{100, 100, 100} +); +array.write(new long[]{0, 0, 0}, data); +``` +--- +## Core Concepts +### Arrays +Arrays are N-dimensional, chunked, and optionally compressed data structures. Each array has: +- **Shape**: Dimensions of the array (e.g., `[1000, 1000, 1000]`) +- **Data Type**: Type of elements (e.g., `FLOAT32`, `INT64`, `UINT8`) +- **Chunk Shape**: How the array is divided for storage +- **Fill Value**: Default value for uninitialized chunks +- **Codecs/Compressors**: Compression and encoding configuration +### Groups +Groups are hierarchical containers for arrays and other groups: +```java +import dev.zarr.zarrjava.v3.Group; +Group group = Group.open("/path/to/zarr/group"); +Group subgroup = (Group) group.get("subgroup"); +Array array = (Array) group.get("array"); +``` +### Storage Handles +All storage operations use `StoreHandle` to abstract the storage backend: +```java +import dev.zarr.zarrjava.store.FilesystemStore; +import dev.zarr.zarrjava.store.StoreHandle; +StoreHandle handle = new FilesystemStore("/path").resolve("myarray"); +``` +--- +## Working with Arrays +### Opening Arrays +Explicitly specify version (Recommended): +```java +// Zarr v2 +dev.zarr.zarrjava.v2.Array v2Array = + dev.zarr.zarrjava.v2.Array.open("/path/to/v2/array"); +// Zarr v3 +dev.zarr.zarrjava.v3.Array v3Array = + dev.zarr.zarrjava.v3.Array.open("/path/to/v3/array"); +``` + +Auto-detect Zarr version: +```java +import dev.zarr.zarrjava.core.Array; +Array array = Array.open("/path/to/array"); +``` +### Creating Arrays +#### Zarr v3 +```java +import dev.zarr.zarrjava.v3.Array; +import dev.zarr.zarrjava.v3.DataType; +import dev.zarr.zarrjava.store.FilesystemStore; +Array array = Array.create( + new FilesystemStore("/path/to/zarr").resolve("myarray"), + Array.metadataBuilder() + .withShape(100, 200, 300) + .withDataType(DataType.INT32) + .withChunkShape(10, 20, 30) + .withFillValue(0) + .withCodecs(c -> c.withBlosc("zstd", 5)) + .build() +); +``` +#### Zarr v2 +```java +import dev.zarr.zarrjava.v2.Array; +import dev.zarr.zarrjava.v2.DataType; +Array array = Array.create( + new FilesystemStore("/path/to/zarr").resolve("myarray"), + Array.metadataBuilder() + .withShape(100, 200, 300) + .withDataType(DataType.INT32) + .withChunks(10, 20, 30) + .withFillValue(0) + .withBloscCompressor("zstd", 5) + .build() +); +``` +### Reading Data +#### Read Entire Array +```java +ucar.ma2.Array data = array.read(); +``` +#### Read Subset +```java +ucar.ma2.Array subset = array.read( + new long[]{10, 20, 30}, // offset + new int[]{50, 60, 70} // shape +); +``` +#### Read without Parallelism +```java +ucar.ma2.Array data = array.read( + new long[]{0, 0, 0}, + new int[]{100, 100, 100}, + false // disable parallel processing +); +``` +#### Using ArrayAccessor (Fluent API) +```java +ucar.ma2.Array data = array.access() + .withOffset(10, 20, 30) + .withShape(50, 60, 70) + .read(); +``` +### Writing Data +```java +// Write at origin +array.write(data); +// Write at offset +array.write(new long[]{10, 20, 30}, data); +// Write without parallelism +array.write(new long[]{0, 0, 0}, data, false); +``` +### Resizing Arrays +```java +// Resize (metadata only, default behavior) +Array resizedArray = array.resize(new long[]{200, 300, 400}); +// Resize and delete out-of-bounds chunks +Array resizedArray = array.resize( + new long[]{200, 300, 400}, + false // resizeMetadataOnly = false +); +// Resize with parallel cleanup +Array resizedArray = array.resize( + new long[]{200, 300, 400}, + false, // delete chunks + true // parallel processing +); +``` +### Managing Attributes +```java +import dev.zarr.zarrjava.core.Attributes; +// Set attributes +Attributes attrs = new Attributes(); +attrs.put("description", "My dataset"); +attrs.put("units", "meters"); +Array updatedArray = array.setAttributes(attrs); +// Update attributes +Array updatedArray = array.updateAttributes(currentAttrs -> { + currentAttrs.put("modified", "2026-02-02"); + return currentAttrs; +}); +// Read attributes +Attributes attrs = array.metadata().attributes(); +String description = (String) attrs.get("description"); +``` +--- +## Working with Groups +### Creating Groups +```java +import dev.zarr.zarrjava.v3.Group; +import dev.zarr.zarrjava.store.FilesystemStore; +// Create a group +Group group = Group.create( + new FilesystemStore("/path/to/zarr").resolve() +); +// Create with attributes +Attributes attrs = new Attributes(); +attrs.put("description", "My hierarchy"); +Group group = Group.create( + new FilesystemStore("/path/to/zarr").resolve(), + attrs +); +``` +### Navigating Groups +```java +// Open a group +Group group = Group.open("/path/to/zarr"); +// Get all members +Map members = group.members(); +// Check existence +boolean exists = group.contains("subgroup"); +// Get specific member +Node member = group.get("array"); +// Type-check and cast +if (member instanceof dev.zarr.zarrjava.v3.Array) { + dev.zarr.zarrjava.v3.Array array = + (dev.zarr.zarrjava.v3.Array) member; +} +``` +### Creating Children +```java +// Create subgroup +Group subgroup = group.createGroup("subgroup"); +// Create array in group +Array array = group.createArray( + "myarray", + Array.metadataBuilder() + .withShape(100, 100) + .withDataType(DataType.FLOAT32) + .withChunkShape(10, 10) + .build() +); +``` +### Hierarchical Example +```java +Group root = Group.create( + new FilesystemStore("/path/to/zarr").resolve() +); +// Create hierarchy +Group data = root.createGroup("data"); +Group metadata = root.createGroup("metadata"); +Array rawData = data.createArray( + "raw", + Array.metadataBuilder() + .withShape(1000, 1000) + .withDataType(DataType.UINT16) + .withChunkShape(100, 100) + .build() +); +``` +--- +## Storage Backends +### Filesystem Storage +```java +import dev.zarr.zarrjava.store.FilesystemStore; +FilesystemStore store = new FilesystemStore("/path/to/zarr"); +Array array = Array.open(store.resolve("myarray")); +``` +### HTTP Storage (Read-only) +```java +import dev.zarr.zarrjava.store.HttpStore; +HttpStore store = new HttpStore("https://example.com/data/zarr"); +Array array = Array.open(store.resolve("myarray")); +``` +### S3 Storage +```java +import dev.zarr.zarrjava.store.S3Store; +import software.amazon.awssdk.regions.Region; +S3Store store = new S3Store( + "my-bucket", + "path/prefix", + Region.US_EAST_1 +); +Array array = Array.open(store.resolve("myarray")); +// With custom S3 client +S3Client s3Client = S3Client.builder() + .region(Region.US_WEST_2) + .build(); +S3Store customStore = new S3Store( + "my-bucket", + "path/prefix", + s3Client +); +``` +### In-Memory Storage +```java +import dev.zarr.zarrjava.store.MemoryStore; +MemoryStore store = new MemoryStore(); +Array array = Array.create( + store.resolve("myarray"), + Array.metadataBuilder() + .withShape(100, 100) + .withDataType(DataType.FLOAT32) + .withChunkShape(10, 10) + .build() +); +``` +### ZIP Storage +#### Read-only ZIP +```java +import dev.zarr.zarrjava.store.ReadOnlyZipStore; +ReadOnlyZipStore store = new ReadOnlyZipStore("/path/to/archive.zip"); +Array array = Array.open(store.resolve("myarray")); +``` +#### Buffered ZIP (Read/Write) +```java +import dev.zarr.zarrjava.store.BufferedZipStore; +BufferedZipStore store = new BufferedZipStore("/path/to/archive.zip"); +Array array = Array.create( + store.resolve("myarray"), + Array.metadataBuilder() + .withShape(100, 100) + .withDataType(DataType.FLOAT32) + .withChunkShape(10, 10) + .build() +); +// Important: Close to flush changes +store.close(); +``` +--- +## Compression and Codecs +### Zarr v3 Codecs +#### Blosc Compression +```java +// Default settings (zstd, level 5) +.withCodecs(c -> c.withBlosc()) +// Custom compressor and level +.withCodecs(c -> c.withBlosc("lz4", 9)) +// Full configuration +.withCodecs(c -> c.withBlosc("lz4", "shuffle", 9)) +``` +**Compressors**: `blosclz`, `lz4`, `lz4hc`, `zlib`, `zstd` +**Shuffle**: `noshuffle`, `shuffle`, `bitshuffle` +**Levels**: 0-9 (0=none, 9=max) +#### Gzip Compression +```java +.withCodecs(c -> c.withGzip(6)) // Level 1-9 +``` +#### Zstd Compression +```java +.withCodecs(c -> c.withZstd(3)) // Level 1-22 +``` +#### Transpose Codec +```java +.withCodecs(c -> c + .withTranspose(new int[]{2, 1, 0}) // Reverse dimensions + .withBlosc()) +``` +#### Sharding +Combine multiple chunks into shard files: +```java +.withCodecs(c -> c.withSharding( + new int[]{10, 10, 10}, // Chunks per shard + innerCodecs -> innerCodecs.withBlosc() +)) +``` +### Zarr v2 Compressors +#### Blosc Compressor +```java +// Default +.withBloscCompressor() +// Custom +.withBloscCompressor("lz4", "shuffle", 9) +``` +#### Zlib Compressor +```java +.withZlibCompressor(6) // Level 1-9 +``` +--- +## Advanced Topics +### Data Types +**Integer**: `INT8`, `INT16`, `INT32`, `INT64`, `UINT8`, `UINT16`, `UINT32`, `UINT64` +**Float**: `FLOAT32`, `FLOAT64` +**Other**: `BOOL`, `COMPLEX64`, `COMPLEX128` +```java +import dev.zarr.zarrjava.v3.DataType; +.withDataType(DataType.UINT16) +``` +### Working with Large Arrays +For arrays exceeding `Integer.MAX_VALUE` elements: +```java +Array array = Array.create( + storeHandle, + Array.metadataBuilder() + .withShape(10000L, 10000L, 10000L) // 1 trillion elements + .withDataType(DataType.UINT8) + .withChunkShape(100, 100, 100) + .withFillValue((byte) 0) + .build() +); +// Read from large offset +ucar.ma2.Array data = array.read( + new long[]{5000000000L, 0, 0}, // Beyond int range + new int[]{100, 100, 100} +); +``` +### Parallel I/O +```java +// Parallel reading +array.read(offset, shape, true); +// Parallel writing +array.write(offset, data, true); +// Parallel resize +array.resize(newShape, false, true); +``` +### Chunk-level Operations +```java +// Read single chunk +long[] chunkCoords = new long[]{0, 0, 0}; +ucar.ma2.Array chunk = array.readChunk(chunkCoords); +// Write single chunk +array.writeChunk(chunkCoords, chunk); +``` +### Exception Handling +```java +import dev.zarr.zarrjava.ZarrException; +import java.io.IOException; +try { + Array array = Array.open("/path/to/array"); + ucar.ma2.Array data = array.read(); +} catch (ZarrException e) { + System.err.println("Zarr error: " + e.getMessage()); +} catch (IOException e) { + System.err.println("I/O error: " + e.getMessage()); +} +``` +### Best Practices +1. **Chunk sizes for Best Performance**: + - refer to [Zarr Performance Guide]( + https://zarr.readthedocs.io/en/latest/user-guide/performance/) for recommendations +2. **Use compression**: Almost always beneficial for scientific data + - Blosc is fast and effective for most use cases + - Zstd for better compression ratios + - Gzip for compatibility +3. **Batch writes**: Write larger chunks at once rather than many small writes +4. **Consider sharding**: For v3 arrays with many small chunks + ```java + .withCodecs(c -> c.withSharding(new int[]{10, 10, 10}, inner -> inner.withBlosc())) + ``` +--- +## API Reference +### Array Methods +#### Creation and Opening +- `Array.open(String/Path/StoreHandle)` - Open array +- `Array.create(StoreHandle, ArrayMetadata)` - Create array +- `Array.metadataBuilder()` - Get metadata builder +#### Reading +- `read()` - Read entire array +- `read(long[] offset, int[] shape)` - Read subset +- `read(long[] offset, int[] shape, boolean parallel)` - With parallelism +- `read(boolean parallel)` - Read entire array with parallelism control +- `readChunk(long[] chunkCoords)` - Read single chunk +#### Writing +- `write(ucar.ma2.Array)` - Write at origin +- `write(long[] offset, ucar.ma2.Array)` - Write at offset +- `write(long[] offset, ucar.ma2.Array, boolean parallel)` - With parallelism +- `write(ucar.ma2.Array, boolean parallel)` - Write at origin with parallelism +- `writeChunk(long[] chunkCoords, ucar.ma2.Array)` - Write chunk +#### Metadata +- `resize(long[] newShape)` - Resize (metadata only) +- `resize(long[] newShape, boolean resizeMetadataOnly)` - Resize with cleanup option +- `resize(long[] newShape, boolean resizeMetadataOnly, boolean parallel)` - With parallelism +- `setAttributes(Attributes)` - Set attributes +- `updateAttributes(Function)` - Update attributes +- `metadata()` - Get metadata +#### Utility +- `access()` - Get fluent accessor API +### Group Methods +#### Creation and Opening +- `Group.open(String/Path/StoreHandle)` - Open group +- `Group.create(StoreHandle)` - Create group +- `Group.create(StoreHandle, Attributes)` - Create with attributes +#### Navigation +- `members()` - Get all members +- `get(String key)` - Get member by key +- `contains(String key)` - Check existence +#### Children +- `createGroup(String key)` - Create subgroup +- `createGroup(String key, Attributes)` - Create subgroup with attributes +- `createArray(String key, ArrayMetadata)` - Create array +#### Metadata +- `setAttributes(Attributes)` - Set group attributes +- `metadata()` - Get group metadata +### Store Implementations +- `FilesystemStore(String/Path)` - Local filesystem +- `HttpStore(String url)` - HTTP/HTTPS (read-only) +- `S3Store(S3Client s3client, String bucketName, String prefix)` - AWS S3 +- `MemoryStore()` - In-memory +- `ReadOnlyZipStore(String path)` - ZIP (read-only) +- `BufferedZipStore(String path)` - ZIP (read/write) +### ArrayMetadataBuilder Methods (v3) +- `withShape(long... shape)` - Set array shape +- `withDataType(DataType)` - Set data type +- `withChunkShape(int... chunkShape)` - Set chunk shape +- `withFillValue(Object)` - Set fill value +- `withCodecs(Function)` - Configure codecs +- `withAttributes(Attributes)` - Set attributes +- `build()` - Build metadata +### CodecBuilder Methods (v3) +- `withBlosc()` - Add Blosc (default settings) +- `withBlosc(String cname)` - Add Blosc with compressor +- `withBlosc(String cname, int clevel)` - Add Blosc with compressor and level +- `withBlosc(String cname, String shuffle, int clevel)` - Add Blosc fully configured +- `withGzip(int level)` - Add Gzip +- `withZstd(int level)` - Add Zstd +- `withTranspose(int[] order)` - Add transpose +- `withSharding(int[] chunksPerShard, Function)` - Add sharding +### ArrayMetadataBuilder Methods (v2) +- `withShape(long... shape)` - Set array shape +- `withDataType(DataType)` - Set data type +- `withChunks(int... chunks)` - Set chunk shape +- `withFillValue(Object)` - Set fill value +- `withBloscCompressor()` - Use Blosc (default) +- `withBloscCompressor(String cname)` - Blosc with compressor +- `withBloscCompressor(String cname, int clevel)` - Blosc with settings +- `withBloscCompressor(String cname, String shuffle, int clevel)` - Blosc fully configured +- `withZlibCompressor(int level)` - Use Zlib +- `withAttributes(Attributes)` - Set attributes +- `build()` - Build metadata +--- +## Examples +### Complete Example: Creating a 3D Dataset +```java +import dev.zarr.zarrjava.v3.*; +import dev.zarr.zarrjava.store.FilesystemStore; +import dev.zarr.zarrjava.core.Attributes; +public class ZarrExample { + public static void main(String[] args) throws Exception { + // Create root group + FilesystemStore store = new FilesystemStore("/tmp/my_dataset"); + Group root = Group.create(store.resolve()); + // Add attributes to root + Attributes rootAttrs = new Attributes(); + rootAttrs.put("description", "My scientific dataset"); + rootAttrs.put("created", "2026-02-02"); + root = root.setAttributes(rootAttrs); + // Create data group + Group dataGroup = root.createGroup("data"); + // Create raw data array + Array rawArray = dataGroup.createArray( + "raw", + Array.metadataBuilder() + .withShape(1000, 1000, 100) + .withDataType(DataType.UINT16) + .withChunkShape(100, 100, 10) + .withFillValue(0) + .withCodecs(c -> c.withBlosc("zstd", 5)) + .build() + ); + // Write some data + ucar.ma2.Array data = ucar.ma2.Array.factory( + ucar.ma2.DataType.USHORT, + new int[]{100, 100, 10} + ); + // Fill with data... + rawArray.write(new long[]{0, 0, 0}, data, true); + System.out.println("Dataset created successfully!"); + System.out.println("Array shape: " + + java.util.Arrays.toString(rawArray.metadata().shape)); + } +} +``` +### Reading from HTTP +```java +import dev.zarr.zarrjava.v3.*; +import dev.zarr.zarrjava.store.HttpStore; +public class ReadHttpExample { + public static void main(String[] args) throws Exception { + // Open remote Zarr store + HttpStore store = new HttpStore( + "https://static.webknossos.org/data/zarr_v3" + ); + Group hierarchy = Group.open(store.resolve("l4_sample")); + Group color = (Group) hierarchy.get("color"); + Array array = (Array) color.get("1"); + // Read a subset + ucar.ma2.Array data = array.read( + new long[]{0, 3073, 3073, 513}, + new int[]{1, 64, 64, 64} + ); + System.out.println("Read " + data.getSize() + " elements"); + } +} +``` +### Working with S3 +```java +import dev.zarr.zarrjava.v3.*; +import dev.zarr.zarrjava.store.S3Store; +import software.amazon.awssdk.regions.Region; +public class S3Example { + public static void main(String[] args) throws Exception { + // Create S3 store + S3Store store = new S3Store( + "my-bucket", + "data/zarr", + Region.US_EAST_1 + ); + // Create array + Array array = Array.create( + store.resolve("myarray"), + Array.metadataBuilder() + .withShape(1000, 1000) + .withDataType(DataType.FLOAT32) + .withChunkShape(100, 100) + .withFillValue(0.0f) + .withCodecs(c -> c.withBlosc()) + .build() + ); + // Write data + ucar.ma2.Array data = ucar.ma2.Array.factory( + ucar.ma2.DataType.FLOAT, + new int[]{100, 100} + ); + array.write(new long[]{0, 0}, data); + System.out.println("Written to S3 successfully!"); + } +} +``` +### Using Sharding (v3) +```java +import dev.zarr.zarrjava.v3.*; +import dev.zarr.zarrjava.store.FilesystemStore; +public class ShardingExample { + public static void main(String[] args) throws Exception { + Array array = Array.create( + new FilesystemStore("/tmp/zarr").resolve("sharded"), + Array.metadataBuilder() + .withShape(10000, 10000, 1000) + .withDataType(DataType.UINT8) + .withChunkShape(100, 100, 100) + .withFillValue((byte) 0) + .withCodecs(c -> c.withSharding( + new int[]{10, 10, 10}, // 1000 chunks per shard + innerCodecs -> innerCodecs + .withBlosc("zstd", 5) + )) + .build() + ); + System.out.println("Sharded array created!"); + System.out.println("Chunks per shard: 10 x 10 x 10 = 1000"); + } +} +``` +### Parallel I/O Example +```java +import dev.zarr.zarrjava.v3.*; +import dev.zarr.zarrjava.store.FilesystemStore; +public class ParallelIOExample { + public static void main(String[] args) throws Exception { + Array array = Array.open("/path/to/large/array"); + // Read with parallelism + long startTime = System.currentTimeMillis(); + ucar.ma2.Array data = array.read( + new long[]{0, 0, 0}, + new int[]{1000, 1000, 100}, + true // Enable parallel reading + ); + long duration = System.currentTimeMillis() - startTime; + System.out.println("Read " + data.getSize() + + " elements in " + duration + "ms (parallel)"); + // Compare with serial reading + startTime = System.currentTimeMillis(); + data = array.read( + new long[]{0, 0, 0}, + new int[]{1000, 1000, 100}, + false // Serial reading + ); + duration = System.currentTimeMillis() - startTime; + System.out.println("Read " + data.getSize() + + " elements in " + duration + "ms (serial)"); + } +} +``` +--- +## Troubleshooting +### Common Issues +**Problem**: `ZarrException: No Zarr array found at the specified location` +**Solution**: Check that the path is correct and contains `.zarray` (v2) or `zarr.json` (v3) +**Problem**: `OutOfMemoryError` when reading large arrays +**Solution**: Read smaller subsets or increase JVM heap size with `-Xmx` +```bash +java -Xmx8g -jar myapp.jar +``` +**Problem**: Slow I/O performance +**Solution**: +- Enable parallelism: `array.read(offset, shape, true)` +- Adjust chunk sizes (aim for 1-100 MB per chunk) +- Use appropriate compression (Blosc is fastest) +- Check network bandwidth (for HTTP/S3) +**Problem**: `IllegalArgumentException: 'offset' needs to have rank...` +**Solution**: Ensure offset and shape arrays match the array's number of dimensions +```java +// Correct +array.read(new long[]{0, 0, 0}, new int[]{10, 10, 10}); // 3D array +// Wrong +array.read(new long[]{0, 0}, new int[]{10, 10}); // Wrong rank! +``` +**Problem**: Data appears corrupted +**Solution**: +- Verify data type matches between write and read +- Check compression codec compatibility +- Ensure proper store closing (especially ZIP stores) +**Problem**: `ZarrException: Requested data is outside of the array's domain` +**Solution**: Check that `offset + shape <= array.shape` for all dimensions +```java +// Array shape: [1000, 1000] +// Wrong: offset[0] + shape[0] = 950 + 100 = 1050 > 1000 +array.read(new long[]{950, 0}, new int[]{100, 100}); +// Correct +array.read(new long[]{900, 0}, new int[]{100, 100}); +``` +**Problem**: S3Store connection errors +**Solution**: +- Check AWS credentials configuration +- Verify bucket name and region +- Check IAM permissions for S3 access +- Ensure network connectivity + +**Problem**: ZIP store not writing changes +**Solution**: Always close the store explicitly +```java +BufferedZipStore store = new BufferedZipStore("/path/to/archive.zip"); +try { + // Use store +} finally { + store.close(); // Important! +} +``` +### Performance Tips +1. **Chunk size optimization**: + ```java + // Too small (many I/O operations) + .withChunkShape(10, 10, 10) // ~1KB chunks + // Good balance + .withChunkShape(100, 100, 100) // ~1MB chunks (for UINT8) + // May be too large (high memory usage) + .withChunkShape(1000, 1000, 1000) // ~1GB chunks + ``` +2. **Access patterns**: Align chunk shape with your access pattern + ```java + // For row-wise access + .withChunkShape(1, 1000, 1000) // Read entire rows efficiently + // For column-wise access + .withChunkShape(1000, 1, 1000) // Read entire columns efficiently + // For balanced 3D access + .withChunkShape(100, 100, 100) // Balanced for all dimensions + ``` +3. **Compression trade-offs**: + ```java + // Fastest (minimal compression) + .withCodecs(c -> c.withBlosc("lz4", "noshuffle", 1)) + // Balanced (good speed and compression) + .withCodecs(c -> c.withBlosc("zstd", "shuffle", 5)) + // Best compression (slower) + .withCodecs(c -> c.withZstd(22)) + ``` +### Getting Help +- **GitHub Issues**: [github.com/zarr-developers/zarr-java/issues](https://github.com/zarr-developers/zarr-java/issues) +- **Zarr Community**: [zarr.dev](https://zarr.dev/) +- **Specification**: [zarr-specs.readthedocs.io](https://zarr-specs.readthedocs.io/) +- **Discussions**: [github.com/zarr-developers/zarr-specs/discussions](https://github.com/zarr-developers/zarr-specs/discussions) + +When reporting issues, please include: +- zarr-java version +- Java version +- Zarr format version (v2 or v3) +- Minimal reproducible example +- Stack trace (if applicable) +--- + +## License +zarr-java is licensed under the MIT License. See [LICENSE](LICENSE) for details. + +--- +## Contributing +Contributions are welcome! Please see the [development guide](README.md#development-start-guide) for information on: +- Setting up a development environment +- Running tests +- Code style and formatting +- Submitting pull requests From e3440cfdf9d2efbcc6a37d47ffe690ce74872cb7 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 26 Jan 2026 10:24:11 +0100 Subject: [PATCH 02/10] add tests diff --git c/src/main/java/dev/zarr/zarrjava/v2/Array.java i/src/main/java/dev/zarr/zarrjava/v2/Array.java index a34c85a..c5ed44b 100644 --- c/src/main/java/dev/zarr/zarrjava/v2/Array.java +++ i/src/main/java/dev/zarr/zarrjava/v2/Array.java @@ -248,7 +248,8 @@ public class Array extends dev.zarr.zarrjava.core.Array implements Node { * @throws IOException throws IOException if the new metadata cannot be serialized */ public Array updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } @Override diff --git c/src/main/java/dev/zarr/zarrjava/v2/Group.java i/src/main/java/dev/zarr/zarrjava/v2/Group.java index d3229e4..73b8cdf 100644 --- c/src/main/java/dev/zarr/zarrjava/v2/Group.java +++ i/src/main/java/dev/zarr/zarrjava/v2/Group.java @@ -283,7 +283,8 @@ public class Group extends dev.zarr.zarrjava.core.Group implements Node { */ public Group updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } diff --git c/src/main/java/dev/zarr/zarrjava/v3/Array.java i/src/main/java/dev/zarr/zarrjava/v3/Array.java index ea029db..45060e9 100644 --- c/src/main/java/dev/zarr/zarrjava/v3/Array.java +++ i/src/main/java/dev/zarr/zarrjava/v3/Array.java @@ -248,7 +248,8 @@ public class Array extends dev.zarr.zarrjava.core.Array implements Node { * @throws IOException throws IOException if the new metadata cannot be serialized */ public Array updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } @Override diff --git c/src/main/java/dev/zarr/zarrjava/v3/Group.java i/src/main/java/dev/zarr/zarrjava/v3/Group.java index 1d3aa6b..8b1a81b 100644 --- c/src/main/java/dev/zarr/zarrjava/v3/Group.java +++ i/src/main/java/dev/zarr/zarrjava/v3/Group.java @@ -289,7 +289,8 @@ public class Group extends dev.zarr.zarrjava.core.Group implements Node { * @throws IOException if the metadata cannot be serialized */ public Group updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } /** diff --git c/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java i/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java new file mode 100644 index 0000000..60db2ee --- /dev/null +++ i/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java @@ -0,0 +1,154 @@ +package dev.zarr.zarrjava; + +import dev.zarr.zarrjava.store.FilesystemStore; +import dev.zarr.zarrjava.store.StoreHandle; +import dev.zarr.zarrjava.v3.Array; +import dev.zarr.zarrjava.v3.DataType; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.*; + +public class ParallelWriteTest extends ZarrTest { + + @Test + public void testParallelWriteDataSafety() throws IOException, ZarrException { + // Test internal parallelism of write method (using parallel=true) + Path path = TESTOUTPUT.resolve("parallel_write_safety"); + StoreHandle storeHandle = new FilesystemStore(path).resolve(); + + int shape = 1000; + int chunk = 100; + + Array array = Array.create(storeHandle, Array.metadataBuilder() + .withShape(shape, shape) + .withDataType(DataType.INT32) + .withChunkShape(chunk, chunk) + .withFillValue(0) + .build()); + + int[] data = new int[shape * shape]; + // Fill with some deterministic pattern + for (int i = 0; i < shape * shape; i++) { + data[i] = i; + } + + ucar.ma2.Array outputData = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{shape, shape}, data); + + // Write in parallel + array.write(outputData, true); + + // Read back + ucar.ma2.Array readData = array.read(); + int[] readArr = (int[]) readData.get1DJavaArray(ucar.ma2.DataType.INT); + + Assertions.assertArrayEquals(data, readArr, "Data read back should match data written in parallel"); + } + + @Test + public void testParallelWriteWithSharding() throws IOException, ZarrException { + // Test internal parallelism with Sharding (nested chunks + shared codec state potential) + Path path = TESTOUTPUT.resolve("parallel_write_sharding"); + StoreHandle storeHandle = new FilesystemStore(path).resolve(); + + int shape = 128; // 128x128 + int shardSize = 64; // Shards are 64x64 + int innerChunk = 32; // Inner chunks 32x32 + + // Metadata with sharding + // With shape 128 and shardSize 64, we have 2x2 = 4 shards. + // Array.write(parallel=true) will likely process these shards concurrently. + dev.zarr.zarrjava.v3.ArrayMetadata metadata = Array.metadataBuilder() + .withShape(shape, shape) + .withDataType(DataType.INT32) + .withChunkShape(shardSize, shardSize) // This sets the shard shape (outer chunks) + .withCodecs(c -> c.withSharding(new int[]{innerChunk, innerChunk}, c2 -> c2.withBytes("LITTLE"))) + .withFillValue(0) + .build(); + + Array array = Array.create(storeHandle, metadata); + + int[] data = new int[shape * shape]; + for (int i = 0; i < shape * shape; i++) { + data[i] = i; + } + + ucar.ma2.Array outputData = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{shape, shape}, data); + + // Write in parallel + array.write(outputData, true); + + ucar.ma2.Array readData = array.read(); + int[] readArr = (int[]) readData.get1DJavaArray(ucar.ma2.DataType.INT); + + Assertions.assertArrayEquals(data, readArr, "Sharded data written in parallel should match"); + } + + @Test + public void testConcurrentWritesDifferentChunks() throws IOException, ZarrException, InterruptedException, ExecutionException { + // Test external parallelism (multiple threads calling write on same Array instance) + Path path = TESTOUTPUT.resolve("concurrent_write_safety"); + StoreHandle storeHandle = new FilesystemStore(path).resolve(); + + int chunksX = 10; + int chunksY = 10; + int chunkSize = 50; + int shapeX = chunksX * chunkSize; + int shapeY = chunksY * chunkSize; + + Array array = Array.create(storeHandle, Array.metadataBuilder() + .withShape(shapeX, shapeY) + .withDataType(DataType.INT32) + .withChunkShape(chunkSize, chunkSize) + .withFillValue(-1) + .build()); + + ExecutorService executor = Executors.newFixedThreadPool(8); + List> tasks = new ArrayList<>(); + + for (int i = 0; i < chunksX; i++) { + for (int j = 0; j < chunksY; j++) { + final int cx = i; + final int cy = j; + tasks.add(() -> { + int[] chunkData = new int[chunkSize * chunkSize]; + int val = cx * chunksY + cy; // Unique value per chunk + java.util.Arrays.fill(chunkData, val); + + ucar.ma2.Array ucarArray = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{chunkSize, chunkSize}, chunkData); + + // Write to specific chunk offset + long[] offset = new long[]{cx * chunkSize, cy * chunkSize}; + // Use internal parallelism false to isolate external concurrency test mechanism + array.write(offset, ucarArray, false); + return null; + }); + } + } + + List> futures = executor.invokeAll(tasks); + + for (Future f : futures) { + f.get(); // Check for exceptions + } + executor.shutdown(); + + // Verification + ucar.ma2.Array readData = array.read(); + for (int i = 0; i < chunksX; i++) { + for (int j = 0; j < chunksY; j++) { + int expectedVal = i * chunksY + j; + int originX = i * chunkSize; + int originY = j * chunkSize; + + // Verify a pixel in the chunk + int val = readData.getInt(readData.getIndex().set(originX, originY)); + Assertions.assertEquals(expectedVal, val, "Value at chunk " + i + "," + j + " mismatch"); + } + } + } +} diff --git c/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java i/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java index c3e498a..779824c 100644 --- c/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java +++ i/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java @@ -332,6 +332,31 @@ public class ZarrV2Test extends ZarrTest { assertContainsTestAttributes(array.metadata().attributes()); } + @Test + public void testUpdateAttributesBehavior() throws IOException, ZarrException { + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testUpdateAttributesBehaviorV2"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT8) + .withChunks(5, 5) + .withAttributes(new Attributes(b -> b.set("key1", "val1"))) + .build(); + + Array array1 = Array.create(storeHandle, arrayMetadata); + Array array2 = array1.updateAttributes(attrs -> attrs.set("key2", "val2")); + + Assertions.assertNotSame(array1, array2); + Assertions.assertEquals("val1", array1.metadata().attributes().get("key1")); + Assertions.assertNull(array1.metadata().attributes().get("key2")); + + Assertions.assertEquals("val1", array2.metadata().attributes().get("key1")); + Assertions.assertEquals("val2", array2.metadata().attributes().get("key2")); + + // Re-opening should show the updated attributes + Array array3 = Array.open(storeHandle); + Assertions.assertEquals("val2", array3.metadata().attributes().get("key2")); + } + @Test public void testResizeArray() throws IOException, ZarrException { int[] testData = new int[10 * 10]; @@ -360,6 +385,34 @@ public class ZarrV2Test extends ZarrTest { Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } + @Test + public void testResizeArrayShrink() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkV2"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunks(5, 5) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + array = array.resize(new long[]{5, 5}); + Assertions.assertArrayEquals(new int[]{5, 5}, array.read().getShape()); + + ucar.ma2.Array data = array.read(); + int[] expectedData = new int[5 * 5]; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + expectedData[i * 5 + j] = testData[i * 10 + j]; + } + } + Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + } + @Test public void testGroupAttributes() throws IOException, ZarrException { StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testGroupAttributesV2"); diff --git c/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java i/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java index 3ed3c10..4e67204 100644 --- c/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java +++ i/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java @@ -708,6 +708,31 @@ public class ZarrV3Test extends ZarrTest { assertContainsTestAttributes(array.metadata().attributes()); } + @Test + public void testUpdateAttributesBehavior() throws IOException, ZarrException { + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testUpdateAttributesBehaviorV3"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT8) + .withChunkShape(5, 5) + .withAttributes(new Attributes(b -> b.set("key1", "val1"))) + .build(); + + Array array1 = Array.create(storeHandle, arrayMetadata); + Array array2 = array1.updateAttributes(attrs -> attrs.set("key2", "val2")); + + Assertions.assertNotSame(array1, array2); + Assertions.assertEquals("val1", array1.metadata().attributes().get("key1")); + Assertions.assertNull(array1.metadata().attributes().get("key2")); + + Assertions.assertEquals("val1", array2.metadata().attributes().get("key1")); + Assertions.assertEquals("val2", array2.metadata().attributes().get("key2")); + + // Re-opening should show the updated attributes + Array array3 = Array.open(storeHandle); + Assertions.assertEquals("val2", array3.metadata().attributes().get("key2")); + } + @Test public void testResizeArray() throws IOException, ZarrException { int[] testData = new int[10 * 10]; @@ -736,6 +761,34 @@ public class ZarrV3Test extends ZarrTest { Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } + @Test + public void testResizeArrayShrink() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkV3"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunkShape(5, 5) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + array = array.resize(new long[]{5, 5}); + Assertions.assertArrayEquals(new int[]{5, 5}, array.read().getShape()); + + ucar.ma2.Array data = array.read(); + int[] expectedData = new int[5 * 5]; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + expectedData[i * 5 + j] = testData[i * 10 + j]; + } + } + Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + } + @Test public void testGroupAttributes() throws IOException, ZarrException { StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testGroupAttributesV3"); --- src/main/java/dev/zarr/zarrjava/v2/Array.java | 3 +- src/main/java/dev/zarr/zarrjava/v2/Group.java | 3 +- src/main/java/dev/zarr/zarrjava/v3/Array.java | 3 +- src/main/java/dev/zarr/zarrjava/v3/Group.java | 3 +- .../dev/zarr/zarrjava/ParallelWriteTest.java | 154 ++++++++++++++++++ .../java/dev/zarr/zarrjava/ZarrV2Test.java | 53 ++++++ .../java/dev/zarr/zarrjava/ZarrV3Test.java | 53 ++++++ 7 files changed, 268 insertions(+), 4 deletions(-) create mode 100644 src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java diff --git a/src/main/java/dev/zarr/zarrjava/v2/Array.java b/src/main/java/dev/zarr/zarrjava/v2/Array.java index a34c85a..c5ed44b 100644 --- a/src/main/java/dev/zarr/zarrjava/v2/Array.java +++ b/src/main/java/dev/zarr/zarrjava/v2/Array.java @@ -248,7 +248,8 @@ public Array setAttributes(Attributes newAttributes) throws ZarrException, IOExc * @throws IOException throws IOException if the new metadata cannot be serialized */ public Array updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } @Override diff --git a/src/main/java/dev/zarr/zarrjava/v2/Group.java b/src/main/java/dev/zarr/zarrjava/v2/Group.java index d3229e4..73b8cdf 100644 --- a/src/main/java/dev/zarr/zarrjava/v2/Group.java +++ b/src/main/java/dev/zarr/zarrjava/v2/Group.java @@ -283,7 +283,8 @@ public Group setAttributes(Attributes newAttributes) throws ZarrException, IOExc */ public Group updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } diff --git a/src/main/java/dev/zarr/zarrjava/v3/Array.java b/src/main/java/dev/zarr/zarrjava/v3/Array.java index ea029db..45060e9 100644 --- a/src/main/java/dev/zarr/zarrjava/v3/Array.java +++ b/src/main/java/dev/zarr/zarrjava/v3/Array.java @@ -248,7 +248,8 @@ public Array setAttributes(Attributes newAttributes) throws ZarrException, IOExc * @throws IOException throws IOException if the new metadata cannot be serialized */ public Array updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } @Override diff --git a/src/main/java/dev/zarr/zarrjava/v3/Group.java b/src/main/java/dev/zarr/zarrjava/v3/Group.java index 1d3aa6b..8b1a81b 100644 --- a/src/main/java/dev/zarr/zarrjava/v3/Group.java +++ b/src/main/java/dev/zarr/zarrjava/v3/Group.java @@ -289,7 +289,8 @@ private Group writeMetadata(GroupMetadata newGroupMetadata) throws IOException { * @throws IOException if the metadata cannot be serialized */ public Group updateAttributes(Function attributeMapper) throws ZarrException, IOException { - return setAttributes(attributeMapper.apply(metadata.attributes)); + Attributes currentAttributes = metadata.attributes != null ? new Attributes(metadata.attributes) : new Attributes(); + return setAttributes(attributeMapper.apply(currentAttributes)); } /** diff --git a/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java b/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java new file mode 100644 index 0000000..60db2ee --- /dev/null +++ b/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java @@ -0,0 +1,154 @@ +package dev.zarr.zarrjava; + +import dev.zarr.zarrjava.store.FilesystemStore; +import dev.zarr.zarrjava.store.StoreHandle; +import dev.zarr.zarrjava.v3.Array; +import dev.zarr.zarrjava.v3.DataType; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.*; + +public class ParallelWriteTest extends ZarrTest { + + @Test + public void testParallelWriteDataSafety() throws IOException, ZarrException { + // Test internal parallelism of write method (using parallel=true) + Path path = TESTOUTPUT.resolve("parallel_write_safety"); + StoreHandle storeHandle = new FilesystemStore(path).resolve(); + + int shape = 1000; + int chunk = 100; + + Array array = Array.create(storeHandle, Array.metadataBuilder() + .withShape(shape, shape) + .withDataType(DataType.INT32) + .withChunkShape(chunk, chunk) + .withFillValue(0) + .build()); + + int[] data = new int[shape * shape]; + // Fill with some deterministic pattern + for (int i = 0; i < shape * shape; i++) { + data[i] = i; + } + + ucar.ma2.Array outputData = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{shape, shape}, data); + + // Write in parallel + array.write(outputData, true); + + // Read back + ucar.ma2.Array readData = array.read(); + int[] readArr = (int[]) readData.get1DJavaArray(ucar.ma2.DataType.INT); + + Assertions.assertArrayEquals(data, readArr, "Data read back should match data written in parallel"); + } + + @Test + public void testParallelWriteWithSharding() throws IOException, ZarrException { + // Test internal parallelism with Sharding (nested chunks + shared codec state potential) + Path path = TESTOUTPUT.resolve("parallel_write_sharding"); + StoreHandle storeHandle = new FilesystemStore(path).resolve(); + + int shape = 128; // 128x128 + int shardSize = 64; // Shards are 64x64 + int innerChunk = 32; // Inner chunks 32x32 + + // Metadata with sharding + // With shape 128 and shardSize 64, we have 2x2 = 4 shards. + // Array.write(parallel=true) will likely process these shards concurrently. + dev.zarr.zarrjava.v3.ArrayMetadata metadata = Array.metadataBuilder() + .withShape(shape, shape) + .withDataType(DataType.INT32) + .withChunkShape(shardSize, shardSize) // This sets the shard shape (outer chunks) + .withCodecs(c -> c.withSharding(new int[]{innerChunk, innerChunk}, c2 -> c2.withBytes("LITTLE"))) + .withFillValue(0) + .build(); + + Array array = Array.create(storeHandle, metadata); + + int[] data = new int[shape * shape]; + for (int i = 0; i < shape * shape; i++) { + data[i] = i; + } + + ucar.ma2.Array outputData = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{shape, shape}, data); + + // Write in parallel + array.write(outputData, true); + + ucar.ma2.Array readData = array.read(); + int[] readArr = (int[]) readData.get1DJavaArray(ucar.ma2.DataType.INT); + + Assertions.assertArrayEquals(data, readArr, "Sharded data written in parallel should match"); + } + + @Test + public void testConcurrentWritesDifferentChunks() throws IOException, ZarrException, InterruptedException, ExecutionException { + // Test external parallelism (multiple threads calling write on same Array instance) + Path path = TESTOUTPUT.resolve("concurrent_write_safety"); + StoreHandle storeHandle = new FilesystemStore(path).resolve(); + + int chunksX = 10; + int chunksY = 10; + int chunkSize = 50; + int shapeX = chunksX * chunkSize; + int shapeY = chunksY * chunkSize; + + Array array = Array.create(storeHandle, Array.metadataBuilder() + .withShape(shapeX, shapeY) + .withDataType(DataType.INT32) + .withChunkShape(chunkSize, chunkSize) + .withFillValue(-1) + .build()); + + ExecutorService executor = Executors.newFixedThreadPool(8); + List> tasks = new ArrayList<>(); + + for (int i = 0; i < chunksX; i++) { + for (int j = 0; j < chunksY; j++) { + final int cx = i; + final int cy = j; + tasks.add(() -> { + int[] chunkData = new int[chunkSize * chunkSize]; + int val = cx * chunksY + cy; // Unique value per chunk + java.util.Arrays.fill(chunkData, val); + + ucar.ma2.Array ucarArray = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{chunkSize, chunkSize}, chunkData); + + // Write to specific chunk offset + long[] offset = new long[]{cx * chunkSize, cy * chunkSize}; + // Use internal parallelism false to isolate external concurrency test mechanism + array.write(offset, ucarArray, false); + return null; + }); + } + } + + List> futures = executor.invokeAll(tasks); + + for (Future f : futures) { + f.get(); // Check for exceptions + } + executor.shutdown(); + + // Verification + ucar.ma2.Array readData = array.read(); + for (int i = 0; i < chunksX; i++) { + for (int j = 0; j < chunksY; j++) { + int expectedVal = i * chunksY + j; + int originX = i * chunkSize; + int originY = j * chunkSize; + + // Verify a pixel in the chunk + int val = readData.getInt(readData.getIndex().set(originX, originY)); + Assertions.assertEquals(expectedVal, val, "Value at chunk " + i + "," + j + " mismatch"); + } + } + } +} diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java index c3e498a..779824c 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java @@ -332,6 +332,31 @@ public void testSetAndUpdateAttributes() throws IOException, ZarrException { assertContainsTestAttributes(array.metadata().attributes()); } + @Test + public void testUpdateAttributesBehavior() throws IOException, ZarrException { + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testUpdateAttributesBehaviorV2"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT8) + .withChunks(5, 5) + .withAttributes(new Attributes(b -> b.set("key1", "val1"))) + .build(); + + Array array1 = Array.create(storeHandle, arrayMetadata); + Array array2 = array1.updateAttributes(attrs -> attrs.set("key2", "val2")); + + Assertions.assertNotSame(array1, array2); + Assertions.assertEquals("val1", array1.metadata().attributes().get("key1")); + Assertions.assertNull(array1.metadata().attributes().get("key2")); + + Assertions.assertEquals("val1", array2.metadata().attributes().get("key1")); + Assertions.assertEquals("val2", array2.metadata().attributes().get("key2")); + + // Re-opening should show the updated attributes + Array array3 = Array.open(storeHandle); + Assertions.assertEquals("val2", array3.metadata().attributes().get("key2")); + } + @Test public void testResizeArray() throws IOException, ZarrException { int[] testData = new int[10 * 10]; @@ -360,6 +385,34 @@ public void testResizeArray() throws IOException, ZarrException { Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } + @Test + public void testResizeArrayShrink() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkV2"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunks(5, 5) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + array = array.resize(new long[]{5, 5}); + Assertions.assertArrayEquals(new int[]{5, 5}, array.read().getShape()); + + ucar.ma2.Array data = array.read(); + int[] expectedData = new int[5 * 5]; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + expectedData[i * 5 + j] = testData[i * 10 + j]; + } + } + Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + } + @Test public void testGroupAttributes() throws IOException, ZarrException { StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testGroupAttributesV2"); diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java index 3ed3c10..4e67204 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java @@ -708,6 +708,31 @@ public void testSetAndUpdateAttributes() throws IOException, ZarrException { assertContainsTestAttributes(array.metadata().attributes()); } + @Test + public void testUpdateAttributesBehavior() throws IOException, ZarrException { + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testUpdateAttributesBehaviorV3"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT8) + .withChunkShape(5, 5) + .withAttributes(new Attributes(b -> b.set("key1", "val1"))) + .build(); + + Array array1 = Array.create(storeHandle, arrayMetadata); + Array array2 = array1.updateAttributes(attrs -> attrs.set("key2", "val2")); + + Assertions.assertNotSame(array1, array2); + Assertions.assertEquals("val1", array1.metadata().attributes().get("key1")); + Assertions.assertNull(array1.metadata().attributes().get("key2")); + + Assertions.assertEquals("val1", array2.metadata().attributes().get("key1")); + Assertions.assertEquals("val2", array2.metadata().attributes().get("key2")); + + // Re-opening should show the updated attributes + Array array3 = Array.open(storeHandle); + Assertions.assertEquals("val2", array3.metadata().attributes().get("key2")); + } + @Test public void testResizeArray() throws IOException, ZarrException { int[] testData = new int[10 * 10]; @@ -736,6 +761,34 @@ public void testResizeArray() throws IOException, ZarrException { Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } + @Test + public void testResizeArrayShrink() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkV3"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunkShape(5, 5) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + array = array.resize(new long[]{5, 5}); + Assertions.assertArrayEquals(new int[]{5, 5}, array.read().getShape()); + + ucar.ma2.Array data = array.read(); + int[] expectedData = new int[5 * 5]; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + expectedData[i * 5 + j] = testData[i * 10 + j]; + } + } + Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + } + @Test public void testGroupAttributes() throws IOException, ZarrException { StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testGroupAttributesV3"); From a8b5b6767f160e0f9f2a477d6fbf02530ceb4416 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 26 Jan 2026 13:21:36 +0100 Subject: [PATCH 03/10] test resize and reopen array --- src/test/java/dev/zarr/zarrjava/ZarrV2Test.java | 3 +++ src/test/java/dev/zarr/zarrjava/ZarrV3Test.java | 3 +++ 2 files changed, 6 insertions(+) diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java index 779824c..da66503 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java @@ -383,6 +383,9 @@ public void testResizeArray() throws IOException, ZarrException { int[] expectedData = new int[5 * 5]; Arrays.fill(expectedData, 1); Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + + Array reopenedArray = Array.open(storeHandle); + Assertions.assertArrayEquals(new int[]{20, 15}, reopenedArray.read().getShape()); } @Test diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java index 4e67204..47b26de 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java @@ -759,6 +759,9 @@ public void testResizeArray() throws IOException, ZarrException { int[] expectedData = new int[5 * 5]; Arrays.fill(expectedData, 1); Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + + Array reopenedArray = Array.open(storeHandle); + Assertions.assertArrayEquals(new int[]{20, 15}, reopenedArray.read().getShape()); } @Test From c7590860f944545734a9a58285af24f7f6f3686c Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 26 Jan 2026 16:05:07 +0100 Subject: [PATCH 04/10] add resizeMetadataOnly argument for resize --- .../java/dev/zarr/zarrjava/core/Array.java | 98 +++++++++++++++++++ src/main/java/dev/zarr/zarrjava/v2/Array.java | 25 ++++- src/main/java/dev/zarr/zarrjava/v3/Array.java | 25 ++++- .../java/dev/zarr/zarrjava/ZarrV2Test.java | 77 +++++++++++++++ .../java/dev/zarr/zarrjava/ZarrV3Test.java | 77 +++++++++++++++ 5 files changed, 298 insertions(+), 4 deletions(-) diff --git a/src/main/java/dev/zarr/zarrjava/core/Array.java b/src/main/java/dev/zarr/zarrjava/core/Array.java index db48551..062cab3 100644 --- a/src/main/java/dev/zarr/zarrjava/core/Array.java +++ b/src/main/java/dev/zarr/zarrjava/core/Array.java @@ -184,6 +184,104 @@ public ucar.ma2.Array readChunk(long[] chunkCoords) throws ZarrException { return codecPipeline.decode(chunkBytes); } + /** + * Deletes chunks that are completely outside the new shape and trims boundary chunks. + * + * @param newShape the new shape of the array + */ + protected void cleanupChunksForResize(long[] newShape) { + ArrayMetadata metadata = metadata(); + final int[] chunkShape = metadata.chunkShape(); + final int ndim = metadata.ndim(); + final dev.zarr.zarrjava.core.chunkkeyencoding.ChunkKeyEncoding chunkKeyEncoding = metadata.chunkKeyEncoding(); + + // Calculate max valid chunk coordinates for the new shape + long[] newMaxChunkCoords = new long[ndim]; + for (int i = 0; i < ndim; i++) { + newMaxChunkCoords[i] = (newShape[i] + chunkShape[i] - 1) / chunkShape[i]; + } + + // Iterate over all possible chunk coordinates in the old shape + long[][] allOldChunkCoords = IndexingUtils.computeChunkCoords(metadata.shape, chunkShape); + + for (long[] chunkCoords : allOldChunkCoords) { + boolean isOutsideBounds = false; + boolean isOnBoundary = false; + + for (int dimIdx = 0; dimIdx < ndim; dimIdx++) { + if (chunkCoords[dimIdx] >= newMaxChunkCoords[dimIdx]) { + isOutsideBounds = true; + break; + } + // Check if this chunk is on the boundary (partially outside new shape) + long chunkEnd = (chunkCoords[dimIdx] + 1) * chunkShape[dimIdx]; + if (chunkEnd > newShape[dimIdx]) { + isOnBoundary = true; + } + } + + String[] chunkKeys = chunkKeyEncoding.encodeChunkKey(chunkCoords); + StoreHandle chunkHandle = storeHandle.resolve(chunkKeys); + + if (isOutsideBounds) { + // Delete chunk that is completely outside + chunkHandle.delete(); + } else if (isOnBoundary) { + // Trim boundary chunk - read, clear out-of-bounds data, write back + try { + trimBoundaryChunk(chunkCoords, newShape, chunkShape); + } catch (ZarrException e) { + throw new RuntimeException(e); + } + } + } + } + + /** + * Trims a boundary chunk by reading it, clearing the out-of-bounds portion, and writing it back. + * + * @param chunkCoords the coordinates of the chunk to trim + * @param newShape the new shape of the array + * @param chunkShape the shape of the chunks + * @throws ZarrException if reading or writing the chunk fails + */ + protected void trimBoundaryChunk(long[] chunkCoords, long[] newShape, int[] chunkShape) throws ZarrException { + ArrayMetadata metadata = metadata(); + final int ndim = metadata.ndim(); + + // Calculate the valid region within this chunk + int[] validShape = new int[ndim]; + boolean needsTrimming = false; + for (int dimIdx = 0; dimIdx < ndim; dimIdx++) { + long chunkStart = chunkCoords[dimIdx] * chunkShape[dimIdx]; + long chunkEnd = chunkStart + chunkShape[dimIdx]; + if (chunkEnd > newShape[dimIdx]) { + validShape[dimIdx] = (int) (newShape[dimIdx] - chunkStart); + needsTrimming = true; + } else { + validShape[dimIdx] = chunkShape[dimIdx]; + } + } + + if (!needsTrimming) { + return; + } + + // Read the existing chunk + ucar.ma2.Array chunkData = readChunk(chunkCoords); + + // Create a new chunk filled with fill value + ucar.ma2.Array newChunkData = metadata.allocateFillValueChunk(); + + // Copy only the valid region + MultiArrayUtils.copyRegion( + chunkData, new int[ndim], newChunkData, new int[ndim], validShape + ); + + // Write the trimmed chunk back + writeChunk(chunkCoords, newChunkData); + } + /** * Writes a ucar.ma2.Array into the Zarr array at the beginning of the Zarr array. The shape of diff --git a/src/main/java/dev/zarr/zarrjava/v2/Array.java b/src/main/java/dev/zarr/zarrjava/v2/Array.java index c5ed44b..fcadfca 100644 --- a/src/main/java/dev/zarr/zarrjava/v2/Array.java +++ b/src/main/java/dev/zarr/zarrjava/v2/Array.java @@ -201,8 +201,10 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException } /** - * Sets a new shape for the Zarr array. It only changes the metadata, no array data is modified or - * deleted. This method returns a new instance of the Zarr array class and the old instance + * Sets a new shape for the Zarr array. Old array data outside the new shape will be deleted. + * If data deletion is not desired, use {@link #resize(long[], boolean)} with + * `resizeMetadataOnly` set to true. + * This method returns a new instance of the Zarr array class and the old instance * becomes invalid. * * @param newShape the new shape of the Zarr array @@ -210,17 +212,36 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException * @throws IOException throws IOException if the new metadata cannot be serialized */ public Array resize(long[] newShape) throws ZarrException, IOException { + return resize(newShape, false); + } + + /** + * Sets a new shape for the Zarr array. This method returns a new instance of the Zarr array class + * and the old instance becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @param resizeMetadataOnly if true, only the metadata is updated; if false, chunks outside the new + * bounds are deleted and boundary chunks are trimmed + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + public Array resize(long[] newShape, boolean resizeMetadataOnly) throws ZarrException, IOException { if (newShape.length != metadata.ndim()) { throw new IllegalArgumentException( "'newShape' needs to have rank '" + metadata.ndim() + "'."); } + if (!resizeMetadataOnly) { + cleanupChunksForResize(newShape); + } + ArrayMetadata newArrayMetadata = ArrayMetadataBuilder.fromArrayMetadata(metadata) .withShape(newShape) .build(); return writeMetadata(newArrayMetadata); } + /** * Sets the attributes of the Zarr array. It overwrites and removes any existing attributes. This * method returns a new instance of the Zarr array class and the old instance becomes invalid. diff --git a/src/main/java/dev/zarr/zarrjava/v3/Array.java b/src/main/java/dev/zarr/zarrjava/v3/Array.java index 45060e9..5be901b 100644 --- a/src/main/java/dev/zarr/zarrjava/v3/Array.java +++ b/src/main/java/dev/zarr/zarrjava/v3/Array.java @@ -201,8 +201,10 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException } /** - * Sets a new shape for the Zarr array. It only changes the metadata, no array data is modified or - * deleted. This method returns a new instance of the Zarr array class and the old instance + * Sets a new shape for the Zarr array. Old array data outside the new shape will be deleted. + * If data deletion is not desired, use {@link #resize(long[], boolean)} with + * `resizeMetadataOnly` set to true. + * This method returns a new instance of the Zarr array class and the old instance * becomes invalid. * * @param newShape the new shape of the Zarr array @@ -210,17 +212,36 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException * @throws IOException throws IOException if the new metadata cannot be serialized */ public Array resize(long[] newShape) throws ZarrException, IOException { + return resize(newShape, false); + } + + /** + * Sets a new shape for the Zarr array. This method returns a new instance of the Zarr array class + * and the old instance becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @param resizeMetadataOnly if true, only the metadata is updated; if false, chunks outside the new + * bounds are deleted and boundary chunks are trimmed + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + public Array resize(long[] newShape, boolean resizeMetadataOnly) throws ZarrException, IOException { if (newShape.length != metadata.ndim()) { throw new IllegalArgumentException( "'newShape' needs to have rank '" + metadata.ndim() + "'."); } + if (!resizeMetadataOnly) { + cleanupChunksForResize(newShape); + } + ArrayMetadata newArrayMetadata = ArrayMetadataBuilder.fromArrayMetadata(metadata) .withShape(newShape) .build(); return writeMetadata(newArrayMetadata); } + /** * Sets the attributes of the Zarr array. It overwrites and removes any existing attributes. This * method returns a new instance of the Zarr array class and the old instance becomes invalid. diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java index da66503..fc945e1 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java @@ -416,6 +416,83 @@ public void testResizeArrayShrink() throws IOException, ZarrException { Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } + @Test + public void testResizeArrayShrinkWithChunkCleanup() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkWithChunkCleanupV2"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunks(5, 5) + .withFillValue(99) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + // Verify all 4 chunks exist before resize + Assertions.assertTrue(storeHandle.resolve("0.0").exists()); + Assertions.assertTrue(storeHandle.resolve("0.1").exists()); + Assertions.assertTrue(storeHandle.resolve("1.0").exists()); + Assertions.assertTrue(storeHandle.resolve("1.1").exists()); + + // Resize with chunk cleanup (resizeMetadataOnly=false) + array = array.resize(new long[]{5, 5}, false); + Assertions.assertArrayEquals(new int[]{5, 5}, array.read().getShape()); + + // Verify only chunk (0,0) still exists + Assertions.assertTrue(storeHandle.resolve("0.0").exists()); + Assertions.assertFalse(storeHandle.resolve("0.1").exists()); + Assertions.assertFalse(storeHandle.resolve("1.0").exists()); + Assertions.assertFalse(storeHandle.resolve("1.1").exists()); + + ucar.ma2.Array data = array.read(); + int[] expectedData = new int[5 * 5]; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + expectedData[i * 5 + j] = testData[i * 10 + j]; + } + } + Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + } + + @Test + public void testResizeArrayShrinkWithBoundaryTrimming() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkWithBoundaryTrimmingV2"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunks(5, 5) + .withFillValue(99) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + // Resize to 7x7 (crosses chunk boundary, should trim boundary chunks) + array = array.resize(new long[]{7, 7}, false); + Assertions.assertArrayEquals(new int[]{7, 7}, array.read().getShape()); + + // Verify chunks (0,0), (0,1), (1,0), (1,1) still exist (boundary trimmed, not deleted) + Assertions.assertTrue(storeHandle.resolve("0.0").exists()); + Assertions.assertTrue(storeHandle.resolve("0.1").exists()); + Assertions.assertTrue(storeHandle.resolve("1.0").exists()); + Assertions.assertTrue(storeHandle.resolve("1.1").exists()); + + // Now resize to expand again and check that trimmed area has fill value + array = array.resize(new long[]{10, 10}, true); + ucar.ma2.Array data = array.read(new long[]{7, 0}, new int[]{3, 10}); + // All values in rows 7-9 should be fill value (99) + int[] expectedFillData = new int[3 * 10]; + Arrays.fill(expectedFillData, 99); + Assertions.assertArrayEquals(expectedFillData, (int[]) data.get1DJavaArray(ma2DataType)); + } + @Test public void testGroupAttributes() throws IOException, ZarrException { StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testGroupAttributesV2"); diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java index 47b26de..198f4d6 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java @@ -792,6 +792,83 @@ public void testResizeArrayShrink() throws IOException, ZarrException { Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } + @Test + public void testResizeArrayShrinkWithChunkCleanup() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkWithChunkCleanupV3"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunkShape(5, 5) + .withFillValue(99) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + // Verify all 4 chunks exist before resize (v3 default encoding has "c" prefix) + Assertions.assertTrue(storeHandle.resolve("c", "0", "0").exists()); + Assertions.assertTrue(storeHandle.resolve("c", "0", "1").exists()); + Assertions.assertTrue(storeHandle.resolve("c", "1", "0").exists()); + Assertions.assertTrue(storeHandle.resolve("c", "1", "1").exists()); + + // Resize with chunk cleanup (resizeMetadataOnly=false) + array = array.resize(new long[]{5, 5}, false); + Assertions.assertArrayEquals(new int[]{5, 5}, array.read().getShape()); + + // Verify only chunk (0,0) still exists + Assertions.assertTrue(storeHandle.resolve("c", "0", "0").exists()); + Assertions.assertFalse(storeHandle.resolve("c", "0", "1").exists()); + Assertions.assertFalse(storeHandle.resolve("c", "1", "0").exists()); + Assertions.assertFalse(storeHandle.resolve("c", "1", "1").exists()); + + ucar.ma2.Array data = array.read(); + int[] expectedData = new int[5 * 5]; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + expectedData[i * 5 + j] = testData[i * 10 + j]; + } + } + Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); + } + + @Test + public void testResizeArrayShrinkWithBoundaryTrimming() throws IOException, ZarrException { + int[] testData = new int[10 * 10]; + Arrays.setAll(testData, p -> p); + + StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testResizeArrayShrinkWithBoundaryTrimmingV3"); + ArrayMetadata arrayMetadata = Array.metadataBuilder() + .withShape(10, 10) + .withDataType(DataType.UINT32) + .withChunkShape(5, 5) + .withFillValue(99) + .build(); + ucar.ma2.DataType ma2DataType = arrayMetadata.dataType.getMA2DataType(); + Array array = Array.create(storeHandle, arrayMetadata); + array.write(new long[]{0, 0}, ucar.ma2.Array.factory(ma2DataType, new int[]{10, 10}, testData)); + + // Resize to 7x7 (crosses chunk boundary, should trim boundary chunks) + array = array.resize(new long[]{7, 7}, false); + Assertions.assertArrayEquals(new int[]{7, 7}, array.read().getShape()); + + // Verify chunks (0,0), (0,1), (1,0), (1,1) still exist (boundary trimmed, not deleted) + Assertions.assertTrue(storeHandle.resolve("c", "0", "0").exists()); + Assertions.assertTrue(storeHandle.resolve("c", "0", "1").exists()); + Assertions.assertTrue(storeHandle.resolve("c", "1", "0").exists()); + Assertions.assertTrue(storeHandle.resolve("c", "1", "1").exists()); + + // Now resize to expand again and check that trimmed area has fill value + array = array.resize(new long[]{10, 10}, true); + ucar.ma2.Array data = array.read(new long[]{7, 0}, new int[]{3, 10}); + // All values in rows 7-9 should be fill value (99) + int[] expectedFillData = new int[3 * 10]; + Arrays.fill(expectedFillData, 99); + Assertions.assertArrayEquals(expectedFillData, (int[]) data.get1DJavaArray(ma2DataType)); + } + @Test public void testGroupAttributes() throws IOException, ZarrException { StoreHandle storeHandle = new FilesystemStore(TESTOUTPUT).resolve("testGroupAttributesV3"); From 056b3a6b114c36841c635051f2b5edda3e552e8a Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Fri, 30 Jan 2026 11:59:40 +0100 Subject: [PATCH 05/10] reformat --- .../java/dev/zarr/zarrjava/core/Array.java | 4 -- .../dev/zarr/zarrjava/ParallelWriteTest.java | 50 +++++++++---------- .../java/dev/zarr/zarrjava/TestUtils.java | 8 +-- .../java/dev/zarr/zarrjava/ZarrV2Test.java | 12 ++--- .../java/dev/zarr/zarrjava/ZarrV3Test.java | 8 +-- 5 files changed, 35 insertions(+), 47 deletions(-) diff --git a/src/main/java/dev/zarr/zarrjava/core/Array.java b/src/main/java/dev/zarr/zarrjava/core/Array.java index 062cab3..c450cee 100644 --- a/src/main/java/dev/zarr/zarrjava/core/Array.java +++ b/src/main/java/dev/zarr/zarrjava/core/Array.java @@ -3,7 +3,6 @@ import dev.zarr.zarrjava.ZarrException; import dev.zarr.zarrjava.core.codec.CodecPipeline; import dev.zarr.zarrjava.store.FilesystemStore; -import dev.zarr.zarrjava.store.Store; import dev.zarr.zarrjava.store.StoreHandle; import dev.zarr.zarrjava.utils.IndexingUtils; import dev.zarr.zarrjava.utils.MultiArrayUtils; @@ -17,9 +16,6 @@ import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; -import java.util.List; -import java.util.Set; -import java.util.stream.Collectors; import java.util.stream.Stream; public abstract class Array extends AbstractNode { diff --git a/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java b/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java index 60db2ee..cc14959 100644 --- a/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java +++ b/src/test/java/dev/zarr/zarrjava/ParallelWriteTest.java @@ -20,10 +20,10 @@ public void testParallelWriteDataSafety() throws IOException, ZarrException { // Test internal parallelism of write method (using parallel=true) Path path = TESTOUTPUT.resolve("parallel_write_safety"); StoreHandle storeHandle = new FilesystemStore(path).resolve(); - + int shape = 1000; int chunk = 100; - + Array array = Array.create(storeHandle, Array.metadataBuilder() .withShape(shape, shape) .withDataType(DataType.INT32) @@ -36,16 +36,16 @@ public void testParallelWriteDataSafety() throws IOException, ZarrException { for (int i = 0; i < shape * shape; i++) { data[i] = i; } - + ucar.ma2.Array outputData = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{shape, shape}, data); - + // Write in parallel array.write(outputData, true); - + // Read back ucar.ma2.Array readData = array.read(); int[] readArr = (int[]) readData.get1DJavaArray(ucar.ma2.DataType.INT); - + Assertions.assertArrayEquals(data, readArr, "Data read back should match data written in parallel"); } @@ -54,11 +54,11 @@ public void testParallelWriteWithSharding() throws IOException, ZarrException { // Test internal parallelism with Sharding (nested chunks + shared codec state potential) Path path = TESTOUTPUT.resolve("parallel_write_sharding"); StoreHandle storeHandle = new FilesystemStore(path).resolve(); - + int shape = 128; // 128x128 int shardSize = 64; // Shards are 64x64 int innerChunk = 32; // Inner chunks 32x32 - + // Metadata with sharding // With shape 128 and shardSize 64, we have 2x2 = 4 shards. // Array.write(parallel=true) will likely process these shards concurrently. @@ -71,20 +71,20 @@ public void testParallelWriteWithSharding() throws IOException, ZarrException { .build(); Array array = Array.create(storeHandle, metadata); - + int[] data = new int[shape * shape]; for (int i = 0; i < shape * shape; i++) { data[i] = i; } - + ucar.ma2.Array outputData = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{shape, shape}, data); - + // Write in parallel array.write(outputData, true); - + ucar.ma2.Array readData = array.read(); int[] readArr = (int[]) readData.get1DJavaArray(ucar.ma2.DataType.INT); - + Assertions.assertArrayEquals(data, readArr, "Sharded data written in parallel should match"); } @@ -93,7 +93,7 @@ public void testConcurrentWritesDifferentChunks() throws IOException, ZarrExcept // Test external parallelism (multiple threads calling write on same Array instance) Path path = TESTOUTPUT.resolve("concurrent_write_safety"); StoreHandle storeHandle = new FilesystemStore(path).resolve(); - + int chunksX = 10; int chunksY = 10; int chunkSize = 50; @@ -118,20 +118,20 @@ public void testConcurrentWritesDifferentChunks() throws IOException, ZarrExcept int[] chunkData = new int[chunkSize * chunkSize]; int val = cx * chunksY + cy; // Unique value per chunk java.util.Arrays.fill(chunkData, val); - + ucar.ma2.Array ucarArray = ucar.ma2.Array.factory(ucar.ma2.DataType.INT, new int[]{chunkSize, chunkSize}, chunkData); - + // Write to specific chunk offset long[] offset = new long[]{cx * chunkSize, cy * chunkSize}; // Use internal parallelism false to isolate external concurrency test mechanism - array.write(offset, ucarArray, false); + array.write(offset, ucarArray, false); return null; }); } } List> futures = executor.invokeAll(tasks); - + for (Future f : futures) { f.get(); // Check for exceptions } @@ -141,13 +141,13 @@ public void testConcurrentWritesDifferentChunks() throws IOException, ZarrExcept ucar.ma2.Array readData = array.read(); for (int i = 0; i < chunksX; i++) { for (int j = 0; j < chunksY; j++) { - int expectedVal = i * chunksY + j; - int originX = i * chunkSize; - int originY = j * chunkSize; - - // Verify a pixel in the chunk - int val = readData.getInt(readData.getIndex().set(originX, originY)); - Assertions.assertEquals(expectedVal, val, "Value at chunk " + i + "," + j + " mismatch"); + int expectedVal = i * chunksY + j; + int originX = i * chunkSize; + int originY = j * chunkSize; + + // Verify a pixel in the chunk + int val = readData.getInt(readData.getIndex().set(originX, originY)); + Assertions.assertEquals(expectedVal, val, "Value at chunk " + i + "," + j + " mismatch"); } } } diff --git a/src/test/java/dev/zarr/zarrjava/TestUtils.java b/src/test/java/dev/zarr/zarrjava/TestUtils.java index ad02558..2c96840 100644 --- a/src/test/java/dev/zarr/zarrjava/TestUtils.java +++ b/src/test/java/dev/zarr/zarrjava/TestUtils.java @@ -30,7 +30,7 @@ public void testInversePermutation() { } @Test - public void testComputeChunkCoords(){ + public void testComputeChunkCoords() { long[] arrayShape = new long[]{100, 100}; int[] chunkShape = new int[]{30, 30}; long[] selOffset = new long[]{50, 20}; @@ -56,7 +56,7 @@ public void testComputeChunkCoords(){ } @Test - public void testComputeProjection(){ + public void testComputeProjection() { // chunk (0,2) contains indexes 34-50 along axis 1 // thus the overlap with selection 32-52 is 34-50 // which is offset 2 in the selection and offset 0 in the chunk @@ -71,8 +71,8 @@ public void testComputeProjection(){ chunkCoords, arrayShape, chunkShape, selOffset, selShape ); Assertions.assertArrayEquals(chunkCoords, projection.chunkCoords); - Assertions.assertArrayEquals(new int[]{0,0}, projection.chunkOffset); - Assertions.assertArrayEquals(new int[]{0,2}, projection.outOffset); + Assertions.assertArrayEquals(new int[]{0, 0}, projection.chunkOffset); + Assertions.assertArrayEquals(new int[]{0, 2}, projection.outOffset); Assertions.assertArrayEquals(new int[]{1, 17}, projection.shape); } diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java index fc945e1..3fb5fd4 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java @@ -348,10 +348,10 @@ public void testUpdateAttributesBehavior() throws IOException, ZarrException { Assertions.assertNotSame(array1, array2); Assertions.assertEquals("val1", array1.metadata().attributes().get("key1")); Assertions.assertNull(array1.metadata().attributes().get("key2")); - + Assertions.assertEquals("val1", array2.metadata().attributes().get("key1")); Assertions.assertEquals("val2", array2.metadata().attributes().get("key2")); - + // Re-opening should show the updated attributes Array array3 = Array.open(storeHandle); Assertions.assertEquals("val2", array3.metadata().attributes().get("key2")); @@ -409,9 +409,7 @@ public void testResizeArrayShrink() throws IOException, ZarrException { ucar.ma2.Array data = array.read(); int[] expectedData = new int[5 * 5]; for (int i = 0; i < 5; i++) { - for (int j = 0; j < 5; j++) { - expectedData[i * 5 + j] = testData[i * 10 + j]; - } + System.arraycopy(testData, i * 10 + 0, expectedData, i * 5 + 0, 5); } Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } @@ -451,9 +449,7 @@ public void testResizeArrayShrinkWithChunkCleanup() throws IOException, ZarrExce ucar.ma2.Array data = array.read(); int[] expectedData = new int[5 * 5]; for (int i = 0; i < 5; i++) { - for (int j = 0; j < 5; j++) { - expectedData[i * 5 + j] = testData[i * 10 + j]; - } + System.arraycopy(testData, i * 10 + 0, expectedData, i * 5 + 0, 5); } Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java index 198f4d6..6570558 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java @@ -785,9 +785,7 @@ public void testResizeArrayShrink() throws IOException, ZarrException { ucar.ma2.Array data = array.read(); int[] expectedData = new int[5 * 5]; for (int i = 0; i < 5; i++) { - for (int j = 0; j < 5; j++) { - expectedData[i * 5 + j] = testData[i * 10 + j]; - } + System.arraycopy(testData, i * 10 + 0, expectedData, i * 5 + 0, 5); } Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } @@ -827,9 +825,7 @@ public void testResizeArrayShrinkWithChunkCleanup() throws IOException, ZarrExce ucar.ma2.Array data = array.read(); int[] expectedData = new int[5 * 5]; for (int i = 0; i < 5; i++) { - for (int j = 0; j < 5; j++) { - expectedData[i * 5 + j] = testData[i * 10 + j]; - } + System.arraycopy(testData, i * 10 + 0, expectedData, i * 5 + 0, 5); } Assertions.assertArrayEquals(expectedData, (int[]) data.get1DJavaArray(ma2DataType)); } From cabbf6c371f8420d1fd2c2d13629b3b50dd40577 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 2 Feb 2026 11:39:35 +0100 Subject: [PATCH 06/10] default to parallel read, write diff --git c/src/main/java/dev/zarr/zarrjava/core/Array.java i/src/main/java/dev/zarr/zarrjava/core/Array.java index c450cee..7e41e7c 100644 --- c/src/main/java/dev/zarr/zarrjava/core/Array.java +++ i/src/main/java/dev/zarr/zarrjava/core/Array.java @@ -21,6 +21,7 @@ import java.util.stream.Stream; public abstract class Array extends AbstractNode { protected CodecPipeline codecPipeline; + public static final boolean DEFAULT_PARALLELISM = true; protected Array(StoreHandle storeHandle) throws ZarrException { super(storeHandle); @@ -299,7 +300,7 @@ public abstract class Array extends AbstractNode { * @param array the data to write */ public void write(long[] offset, ucar.ma2.Array array) { - write(offset, array, false); + write(offset, array, DEFAULT_PARALLELISM); } /** @@ -334,7 +335,7 @@ public abstract class Array extends AbstractNode { */ @Nonnull public ucar.ma2.Array read(final long[] offset, final long[] shape) throws ZarrException { - return read(offset, shape, false); + return read(offset, shape, DEFAULT_PARALLELISM); } /** --- src/main/java/dev/zarr/zarrjava/core/Array.java | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/main/java/dev/zarr/zarrjava/core/Array.java b/src/main/java/dev/zarr/zarrjava/core/Array.java index c450cee..7e41e7c 100644 --- a/src/main/java/dev/zarr/zarrjava/core/Array.java +++ b/src/main/java/dev/zarr/zarrjava/core/Array.java @@ -21,6 +21,7 @@ public abstract class Array extends AbstractNode { protected CodecPipeline codecPipeline; + public static final boolean DEFAULT_PARALLELISM = true; protected Array(StoreHandle storeHandle) throws ZarrException { super(storeHandle); @@ -299,7 +300,7 @@ public void write(ucar.ma2.Array array) { * @param array the data to write */ public void write(long[] offset, ucar.ma2.Array array) { - write(offset, array, false); + write(offset, array, DEFAULT_PARALLELISM); } /** @@ -334,7 +335,7 @@ public ucar.ma2.Array read() throws ZarrException { */ @Nonnull public ucar.ma2.Array read(final long[] offset, final long[] shape) throws ZarrException { - return read(offset, shape, false); + return read(offset, shape, DEFAULT_PARALLELISM); } /** From 703523837212ed255dc95e25fa73c76eb1f49a6b Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 2 Feb 2026 12:15:24 +0100 Subject: [PATCH 07/10] resize default to parallel and resizeMetadataOnly --- .../java/dev/zarr/zarrjava/core/Array.java | 52 +++++++++++++++++-- src/main/java/dev/zarr/zarrjava/v2/Array.java | 26 ++++++++-- src/main/java/dev/zarr/zarrjava/v3/Array.java | 26 ++++++++-- 3 files changed, 91 insertions(+), 13 deletions(-) diff --git a/src/main/java/dev/zarr/zarrjava/core/Array.java b/src/main/java/dev/zarr/zarrjava/core/Array.java index 7e41e7c..c126890 100644 --- a/src/main/java/dev/zarr/zarrjava/core/Array.java +++ b/src/main/java/dev/zarr/zarrjava/core/Array.java @@ -185,8 +185,9 @@ public ucar.ma2.Array readChunk(long[] chunkCoords) throws ZarrException { * Deletes chunks that are completely outside the new shape and trims boundary chunks. * * @param newShape the new shape of the array + * @param parallel utilizes parallelism if true */ - protected void cleanupChunksForResize(long[] newShape) { + protected void cleanupChunksForResize(long[] newShape, boolean parallel) { ArrayMetadata metadata = metadata(); final int[] chunkShape = metadata.chunkShape(); final int ndim = metadata.ndim(); @@ -201,7 +202,12 @@ protected void cleanupChunksForResize(long[] newShape) { // Iterate over all possible chunk coordinates in the old shape long[][] allOldChunkCoords = IndexingUtils.computeChunkCoords(metadata.shape, chunkShape); - for (long[] chunkCoords : allOldChunkCoords) { + Stream chunkStream = Arrays.stream(allOldChunkCoords); + if (parallel) { + chunkStream = chunkStream.parallel(); + } + + chunkStream.forEach(chunkCoords -> { boolean isOutsideBounds = false; boolean isOnBoundary = false; @@ -231,7 +237,7 @@ protected void cleanupChunksForResize(long[] newShape) { throw new RuntimeException(e); } } - } + }); } /** @@ -434,6 +440,46 @@ public ucar.ma2.Array read(final long[] offset, final long[] shape, final boolea return outputArray; } + /** + * Sets a new shape for the Zarr array. Only the metadata is updated by default. + * This method returns a new instance of the Zarr array class and the old instance + * becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + public Array resize(long[] newShape) throws ZarrException, IOException { + return resize(newShape, true); + } + + /** + * Sets a new shape for the Zarr array. This method returns a new instance of the Zarr array class + * and the old instance becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @param resizeMetadataOnly if true, only the metadata is updated; if false, chunks outside the new + * bounds are deleted and boundary chunks are trimmed + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + public Array resize(long[] newShape, boolean resizeMetadataOnly) throws ZarrException, IOException { + return resize(newShape, resizeMetadataOnly, DEFAULT_PARALLELISM); + } + + /** + * Sets a new shape for the Zarr array. This method returns a new instance of the Zarr array class + * and the old instance becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @param resizeMetadataOnly if true, only the metadata is updated; if false, chunks outside the new + * bounds are deleted and boundary chunks are trimmed + * @param parallel utilizes parallelism if true when cleaning up chunks + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + public abstract Array resize(long[] newShape, boolean resizeMetadataOnly, boolean parallel) throws ZarrException, IOException; + public ArrayAccessor access() { return new ArrayAccessor(this); } diff --git a/src/main/java/dev/zarr/zarrjava/v2/Array.java b/src/main/java/dev/zarr/zarrjava/v2/Array.java index fcadfca..5bbfbdb 100644 --- a/src/main/java/dev/zarr/zarrjava/v2/Array.java +++ b/src/main/java/dev/zarr/zarrjava/v2/Array.java @@ -201,9 +201,7 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException } /** - * Sets a new shape for the Zarr array. Old array data outside the new shape will be deleted. - * If data deletion is not desired, use {@link #resize(long[], boolean)} with - * `resizeMetadataOnly` set to true. + * Sets a new shape for the Zarr array. Only the metadata is updated by default. * This method returns a new instance of the Zarr array class and the old instance * becomes invalid. * @@ -211,8 +209,9 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException * @throws ZarrException if the new metadata is invalid * @throws IOException throws IOException if the new metadata cannot be serialized */ + @Override public Array resize(long[] newShape) throws ZarrException, IOException { - return resize(newShape, false); + return resize(newShape, true); } /** @@ -225,14 +224,31 @@ public Array resize(long[] newShape) throws ZarrException, IOException { * @throws ZarrException if the new metadata is invalid * @throws IOException throws IOException if the new metadata cannot be serialized */ + @Override public Array resize(long[] newShape, boolean resizeMetadataOnly) throws ZarrException, IOException { + return resize(newShape, resizeMetadataOnly, DEFAULT_PARALLELISM); + } + + /** + * Sets a new shape for the Zarr array. This method returns a new instance of the Zarr array class + * and the old instance becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @param resizeMetadataOnly if true, only the metadata is updated; if false, chunks outside the new + * bounds are deleted and boundary chunks are trimmed + * @param parallel utilizes parallelism if true when cleaning up chunks + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + @Override + public Array resize(long[] newShape, boolean resizeMetadataOnly, boolean parallel) throws ZarrException, IOException { if (newShape.length != metadata.ndim()) { throw new IllegalArgumentException( "'newShape' needs to have rank '" + metadata.ndim() + "'."); } if (!resizeMetadataOnly) { - cleanupChunksForResize(newShape); + cleanupChunksForResize(newShape, parallel); } ArrayMetadata newArrayMetadata = ArrayMetadataBuilder.fromArrayMetadata(metadata) diff --git a/src/main/java/dev/zarr/zarrjava/v3/Array.java b/src/main/java/dev/zarr/zarrjava/v3/Array.java index 5be901b..6b30e29 100644 --- a/src/main/java/dev/zarr/zarrjava/v3/Array.java +++ b/src/main/java/dev/zarr/zarrjava/v3/Array.java @@ -201,9 +201,7 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException } /** - * Sets a new shape for the Zarr array. Old array data outside the new shape will be deleted. - * If data deletion is not desired, use {@link #resize(long[], boolean)} with - * `resizeMetadataOnly` set to true. + * Sets a new shape for the Zarr array. Only the metadata is updated by default. * This method returns a new instance of the Zarr array class and the old instance * becomes invalid. * @@ -211,8 +209,9 @@ private Array writeMetadata(ArrayMetadata newArrayMetadata) throws ZarrException * @throws ZarrException if the new metadata is invalid * @throws IOException throws IOException if the new metadata cannot be serialized */ + @Override public Array resize(long[] newShape) throws ZarrException, IOException { - return resize(newShape, false); + return resize(newShape, true); } /** @@ -225,14 +224,31 @@ public Array resize(long[] newShape) throws ZarrException, IOException { * @throws ZarrException if the new metadata is invalid * @throws IOException throws IOException if the new metadata cannot be serialized */ + @Override public Array resize(long[] newShape, boolean resizeMetadataOnly) throws ZarrException, IOException { + return resize(newShape, resizeMetadataOnly, DEFAULT_PARALLELISM); + } + + /** + * Sets a new shape for the Zarr array. This method returns a new instance of the Zarr array class + * and the old instance becomes invalid. + * + * @param newShape the new shape of the Zarr array + * @param resizeMetadataOnly if true, only the metadata is updated; if false, chunks outside the new + * bounds are deleted and boundary chunks are trimmed + * @param parallel utilizes parallelism if true when cleaning up chunks + * @throws ZarrException if the new metadata is invalid + * @throws IOException throws IOException if the new metadata cannot be serialized + */ + @Override + public Array resize(long[] newShape, boolean resizeMetadataOnly, boolean parallel) throws ZarrException, IOException { if (newShape.length != metadata.ndim()) { throw new IllegalArgumentException( "'newShape' needs to have rank '" + metadata.ndim() + "'."); } if (!resizeMetadataOnly) { - cleanupChunksForResize(newShape); + cleanupChunksForResize(newShape, parallel); } ArrayMetadata newArrayMetadata = ArrayMetadataBuilder.fromArrayMetadata(metadata) From d3eb00b20e2178af26e8bdfcdb27052fe776ae48 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 2 Feb 2026 16:20:18 +0100 Subject: [PATCH 08/10] fix USERGUIDE.md --- USERGUIDE.md | 317 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 260 insertions(+), 57 deletions(-) diff --git a/USERGUIDE.md b/USERGUIDE.md index c91b8a3..4e330c2 100644 --- a/USERGUIDE.md +++ b/USERGUIDE.md @@ -57,7 +57,7 @@ ucar.ma2.Array data = array.read(); // Read a subset ucar.ma2.Array subset = array.read( new long[]{0, 0, 0}, // offset - new int[]{10, 100, 100} // shape + new long[]{10, 100, 100} // shape ); ``` ### Creating and Writing an Array @@ -166,14 +166,22 @@ ucar.ma2.Array data = array.read(); ```java ucar.ma2.Array subset = array.read( new long[]{10, 20, 30}, // offset - new int[]{50, 60, 70} // shape + new long[]{50, 60, 70} // shape ); ``` -#### Read without Parallelism +#### Read with Parallelism Control +By default, read operations use **parallel processing**. You can disable it for sequential reading: ```java +// Parallel reading (default behavior) ucar.ma2.Array data = array.read( new long[]{0, 0, 0}, - new int[]{100, 100, 100}, + new long[]{100, 100, 100} +); + +// Explicitly disable parallelism if needed +ucar.ma2.Array data = array.read( + new long[]{0, 0, 0}, + new long[]{100, 100, 100}, false // disable parallel processing ); ``` @@ -186,11 +194,13 @@ ucar.ma2.Array data = array.access() ``` ### Writing Data ```java -// Write at origin +// Write at origin (parallel by default) array.write(data); -// Write at offset + +// Write at offset (parallel by default) array.write(new long[]{10, 20, 30}, data); -// Write without parallelism + +// Explicitly disable parallelism if needed array.write(new long[]{0, 0, 0}, data, false); ``` ### Resizing Arrays @@ -248,12 +258,16 @@ Group group = Group.create( ```java // Open a group Group group = Group.open("/path/to/zarr"); -// Get all members -Map members = group.members(); -// Check existence -boolean exists = group.contains("subgroup"); + +// Get all members as a stream +Stream members = group.list(); + +// Convert to array if needed +Node[] memberArray = group.listAsArray(); + // Get specific member Node member = group.get("array"); + // Type-check and cast if (member instanceof dev.zarr.zarrjava.v3.Array) { dev.zarr.zarrjava.v3.Array array = @@ -339,30 +353,81 @@ Array array = Array.create( ); ``` ### ZIP Storage + +ZIP stores provide a convenient way to bundle entire Zarr hierarchies in a single file. + #### Read-only ZIP +**Use when**: Reading data that doesn't fit into memory. The read-only store streams data directly from the ZIP file without loading everything into memory. + ```java import dev.zarr.zarrjava.store.ReadOnlyZipStore; + ReadOnlyZipStore store = new ReadOnlyZipStore("/path/to/archive.zip"); Array array = Array.open(store.resolve("myarray")); +ucar.ma2.Array data = array.read(); ``` + #### Buffered ZIP (Read/Write) +**Use when**: You need to write data to a ZIP archive. The buffered store uses an in-memory buffer (by default) but can be configured with alternative buffers for different use cases. + +Always use try-with-resources or explicitly close the store to ensure changes are flushed: + ```java import dev.zarr.zarrjava.store.BufferedZipStore; + +// Using try-with-resources (recommended) +try (BufferedZipStore store = new BufferedZipStore("/path/to/archive.zip")) { + Array array = Array.create( + store.resolve("myarray"), + Array.metadataBuilder() + .withShape(100, 100) + .withDataType(DataType.FLOAT32) + .withChunkShape(10, 10) + .build() + ); + + ucar.ma2.Array data = ucar.ma2.Array.factory( + ucar.ma2.DataType.FLOAT, + new int[]{100, 100} + ); + array.write(data); + // Store automatically closes and flushes at end of try block +} + +// Manual close (if not using try-with-resources) BufferedZipStore store = new BufferedZipStore("/path/to/archive.zip"); +try { + // ... use store +} finally { + store.close(); // Important: flush changes to disk +} +``` +--- +## Compression and Codecs + +Codecs transform array data during storage and retrieval. They are essential for reducing storage size and optimizing I/O performance. Zarr supports various compression algorithms and data transformations. + +**Key concepts:** +- **Codecs** (v3) or **Compressors** (v2) reduce data size +- Applied during writes, reversed during reads +- Can be chained (v3 only) for multiple transformations +- Choice impacts storage size, read/write speed, and compatibility + +### Zarr v3 Codecs + +Zarr v3 uses a flexible codec pipeline. Configure codecs when creating an array using the `withCodecs()` builder method: + +```java Array array = Array.create( - store.resolve("myarray"), + storeHandle, Array.metadataBuilder() - .withShape(100, 100) + .withShape(1000, 1000) .withDataType(DataType.FLOAT32) - .withChunkShape(10, 10) + .withChunkShape(100, 100) + .withCodecs(c -> c.withBlosc("zstd", 5)) // Configure codecs here .build() ); -// Important: Close to flush changes -store.close(); ``` ---- -## Compression and Codecs -### Zarr v3 Codecs #### Blosc Compression ```java // Default settings (zstd, level 5) @@ -412,12 +477,78 @@ Combine multiple chunks into shard files: --- ## Advanced Topics ### Data Types -**Integer**: `INT8`, `INT16`, `INT32`, `INT64`, `UINT8`, `UINT16`, `UINT32`, `UINT64` -**Float**: `FLOAT32`, `FLOAT64` -**Other**: `BOOL`, `COMPLEX64`, `COMPLEX128` + +Zarr arrays store homogeneous typed data. Choose the appropriate data type based on your data range and precision requirements. The data type is specified when creating an array and cannot be changed later. + +#### Supported Data Types + +Both Zarr v2 and v3 support the same set of data types: + +| Java Enum | v3 Name | v2 Name | Bytes | Range/Description | +|-----------|---------|---------|-------|-------------------| +| `BOOL` | `bool` | `\|b1` | 1 | Boolean: true/false | +| `INT8` | `int8` | `\|i1` | 1 | Signed: -128 to 127 | +| `INT16` | `int16` | `)` - Update attributes - `metadata()` - Get metadata @@ -514,9 +706,10 @@ try { - `Group.create(StoreHandle)` - Create group - `Group.create(StoreHandle, Attributes)` - Create with attributes #### Navigation -- `members()` - Get all members +- `list()` - Get stream of all child nodes +- `listAsArray()` - Get array of all child nodes - `get(String key)` - Get member by key -- `contains(String key)` - Check existence +- `get(String[] key)` - Get member by multi-part key #### Children - `createGroup(String key)` - Create subgroup - `createGroup(String key, Attributes)` - Create subgroup with attributes @@ -619,7 +812,7 @@ public class ReadHttpExample { // Read a subset ucar.ma2.Array data = array.read( new long[]{0, 3073, 3073, 513}, - new int[]{1, 64, 64, 64} + new long[]{1, 64, 64, 64} ); System.out.println("Read " + data.getSize() + " elements"); } @@ -688,29 +881,32 @@ public class ShardingExample { ```java import dev.zarr.zarrjava.v3.*; import dev.zarr.zarrjava.store.FilesystemStore; + public class ParallelIOExample { public static void main(String[] args) throws Exception { Array array = Array.open("/path/to/large/array"); - // Read with parallelism + + // Parallel reading (default behavior) long startTime = System.currentTimeMillis(); ucar.ma2.Array data = array.read( new long[]{0, 0, 0}, - new int[]{1000, 1000, 100}, - true // Enable parallel reading + new long[]{1000, 1000, 100} + // Parallel by default ); long duration = System.currentTimeMillis() - startTime; System.out.println("Read " + data.getSize() + " elements in " + duration + "ms (parallel)"); - // Compare with serial reading + + // Sequential reading (explicitly disable parallelism) startTime = System.currentTimeMillis(); data = array.read( new long[]{0, 0, 0}, - new int[]{1000, 1000, 100}, - false // Serial reading + new long[]{1000, 1000, 100}, + false // Disable parallel ); duration = System.currentTimeMillis() - startTime; System.out.println("Read " + data.getSize() + - " elements in " + duration + "ms (serial)"); + " elements in " + duration + "ms (sequential)"); } } ``` @@ -726,17 +922,18 @@ java -Xmx8g -jar myapp.jar ``` **Problem**: Slow I/O performance **Solution**: -- Enable parallelism: `array.read(offset, shape, true)` -- Adjust chunk sizes (aim for 1-100 MB per chunk) +- Parallelism is enabled by default for better performance +- Ensure chunk sizes are appropriate (aim for 1-100 MB per chunk) - Use appropriate compression (Blosc is fastest) - Check network bandwidth (for HTTP/S3) +- For debugging, you can disable parallelism: `array.read(offset, shape, false)` **Problem**: `IllegalArgumentException: 'offset' needs to have rank...` **Solution**: Ensure offset and shape arrays match the array's number of dimensions ```java // Correct -array.read(new long[]{0, 0, 0}, new int[]{10, 10, 10}); // 3D array +array.read(new long[]{0, 0, 0}, new long[]{10, 10, 10}); // 3D array // Wrong -array.read(new long[]{0, 0}, new int[]{10, 10}); // Wrong rank! +array.read(new long[]{0, 0}, new long[]{10, 10}); // Wrong rank! ``` **Problem**: Data appears corrupted **Solution**: @@ -748,9 +945,9 @@ array.read(new long[]{0, 0}, new int[]{10, 10}); // Wrong rank! ```java // Array shape: [1000, 1000] // Wrong: offset[0] + shape[0] = 950 + 100 = 1050 > 1000 -array.read(new long[]{950, 0}, new int[]{100, 100}); +array.read(new long[]{950, 0}, new long[]{100, 100}); // Correct -array.read(new long[]{900, 0}, new int[]{100, 100}); +array.read(new long[]{900, 0}, new long[]{100, 100}); ``` **Problem**: S3Store connection errors **Solution**: @@ -760,8 +957,14 @@ array.read(new long[]{900, 0}, new int[]{100, 100}); - Ensure network connectivity **Problem**: ZIP store not writing changes -**Solution**: Always close the store explicitly +**Solution**: Always close the store, preferably using try-with-resources ```java +// Recommended: try-with-resources +try (BufferedZipStore store = new BufferedZipStore("/path/to/archive.zip")) { + // Use store +} // Automatically closed + +// Alternative: manual close BufferedZipStore store = new BufferedZipStore("/path/to/archive.zip"); try { // Use store From 80e9ce671a347dd992e2c961dfd6096c393214a5 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Mon, 2 Feb 2026 19:31:19 +0100 Subject: [PATCH 09/10] update tests with long --- src/main/java/dev/zarr/zarrjava/v2/Group.java | 2 +- src/test/java/dev/zarr/zarrjava/ZarrV2Test.java | 2 +- src/test/java/dev/zarr/zarrjava/ZarrV3Test.java | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/main/java/dev/zarr/zarrjava/v2/Group.java b/src/main/java/dev/zarr/zarrjava/v2/Group.java index 73b8cdf..5c946c2 100644 --- a/src/main/java/dev/zarr/zarrjava/v2/Group.java +++ b/src/main/java/dev/zarr/zarrjava/v2/Group.java @@ -27,7 +27,7 @@ public class Group extends dev.zarr.zarrjava.core.Group implements Node { public GroupMetadata metadata; - protected Group(@Nonnull StoreHandle storeHandle, @Nonnull GroupMetadata groupMetadata) throws IOException { + protected Group(@Nonnull StoreHandle storeHandle, @Nonnull GroupMetadata groupMetadata) { super(storeHandle); this.metadata = groupMetadata; } diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java index 3fb5fd4..c30f01d 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV2Test.java @@ -482,7 +482,7 @@ public void testResizeArrayShrinkWithBoundaryTrimming() throws IOException, Zarr // Now resize to expand again and check that trimmed area has fill value array = array.resize(new long[]{10, 10}, true); - ucar.ma2.Array data = array.read(new long[]{7, 0}, new int[]{3, 10}); + ucar.ma2.Array data = array.read(new long[]{7, 0}, new long[]{3, 10}); // All values in rows 7-9 should be fill value (99) int[] expectedFillData = new int[3 * 10]; Arrays.fill(expectedFillData, 99); diff --git a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java index 6570558..b48611a 100644 --- a/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java +++ b/src/test/java/dev/zarr/zarrjava/ZarrV3Test.java @@ -858,7 +858,7 @@ public void testResizeArrayShrinkWithBoundaryTrimming() throws IOException, Zarr // Now resize to expand again and check that trimmed area has fill value array = array.resize(new long[]{10, 10}, true); - ucar.ma2.Array data = array.read(new long[]{7, 0}, new int[]{3, 10}); + ucar.ma2.Array data = array.read(new long[]{7, 0}, new long[]{3, 10}); // All values in rows 7-9 should be fill value (99) int[] expectedFillData = new int[3 * 10]; Arrays.fill(expectedFillData, 99); From 8d38ea28a99308c538b4f44e538d21af59c08108 Mon Sep 17 00:00:00 2001 From: brokkoli71 Date: Fri, 6 Feb 2026 11:19:16 +0100 Subject: [PATCH 10/10] small restructuring of userguide --- README.md | 2 +- USERGUIDE.md | 58 ++++++++++++++++++++-------------------------------- 2 files changed, 23 insertions(+), 37 deletions(-) diff --git a/README.md b/README.md index ac51265..faf0cf2 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ For comprehensive documentation, see the [**User Guide**](USERGUIDE.md), which i - Working with arrays and groups - Storage backends (Filesystem, HTTP, S3, ZIP, Memory) - Compression and codecs -- Advanced topics and best practices +- Best practices - Troubleshooting ## Quick Usage Example diff --git a/USERGUIDE.md b/USERGUIDE.md index 4e330c2..539738e 100644 --- a/USERGUIDE.md +++ b/USERGUIDE.md @@ -654,20 +654,31 @@ try { - `"No Zarr array found at the specified location"` - Check path and ensure `.zarray` (v2) or `zarr.json` (v3) exists - `"Requested data is outside of the array's domain"` - Verify that `offset + shape <= array.shape` - `"Failed to read from store"` - Check network connectivity, file permissions, or storage availability +--- + ### Best Practices -1. **Chunk sizes for Best Performance**: - - refer to [Zarr Performance Guide]( - https://zarr.readthedocs.io/en/latest/user-guide/performance/) for recommendations +1. **Chunk sizes for Best Performance**: + - refer to [Zarr Performance Guide]( + https://zarr.readthedocs.io/en/latest/user-guide/performance/) for recommendations 2. **Use compression**: Almost always beneficial for scientific data - - Blosc is fast and effective for most use cases - - Zstd for better compression ratios - - Gzip for compatibility + - Blosc is fast and effective for most use cases + - Zstd for better compression ratios + - Gzip for compatibility 3. **Batch writes**: Write larger chunks at once rather than many small writes 4. **Consider sharding**: For v3 arrays with many small chunks ```java .withCodecs(c -> c.withSharding(new int[]{10, 10, 10}, inner -> inner.withBlosc())) ``` ---- +5. **Access patterns**: Align chunk shape with your access pattern + ```java + // For row-wise access + .withChunkShape(1, 1000, 1000) // Read entire rows efficiently + // For column-wise access + .withChunkShape(1000, 1, 1000) // Read entire columns efficiently + // For balanced 3D access + .withChunkShape(100, 100, 100) // Balanced for all dimensions + ``` + ## API Reference ### Array Methods #### Creation and Opening @@ -915,6 +926,7 @@ public class ParallelIOExample { ### Common Issues **Problem**: `ZarrException: No Zarr array found at the specified location` **Solution**: Check that the path is correct and contains `.zarray` (v2) or `zarr.json` (v3) + **Problem**: `OutOfMemoryError` when reading large arrays **Solution**: Read smaller subsets or increase JVM heap size with `-Xmx` ```bash @@ -927,6 +939,7 @@ java -Xmx8g -jar myapp.jar - Use appropriate compression (Blosc is fastest) - Check network bandwidth (for HTTP/S3) - For debugging, you can disable parallelism: `array.read(offset, shape, false)` + **Problem**: `IllegalArgumentException: 'offset' needs to have rank...` **Solution**: Ensure offset and shape arrays match the array's number of dimensions ```java @@ -939,7 +952,7 @@ array.read(new long[]{0, 0}, new long[]{10, 10}); // Wrong rank! **Solution**: - Verify data type matches between write and read - Check compression codec compatibility -- Ensure proper store closing (especially ZIP stores) + **Problem**: `ZarrException: Requested data is outside of the array's domain` **Solution**: Check that `offset + shape <= array.shape` for all dimensions ```java @@ -972,34 +985,7 @@ try { store.close(); // Important! } ``` -### Performance Tips -1. **Chunk size optimization**: - ```java - // Too small (many I/O operations) - .withChunkShape(10, 10, 10) // ~1KB chunks - // Good balance - .withChunkShape(100, 100, 100) // ~1MB chunks (for UINT8) - // May be too large (high memory usage) - .withChunkShape(1000, 1000, 1000) // ~1GB chunks - ``` -2. **Access patterns**: Align chunk shape with your access pattern - ```java - // For row-wise access - .withChunkShape(1, 1000, 1000) // Read entire rows efficiently - // For column-wise access - .withChunkShape(1000, 1, 1000) // Read entire columns efficiently - // For balanced 3D access - .withChunkShape(100, 100, 100) // Balanced for all dimensions - ``` -3. **Compression trade-offs**: - ```java - // Fastest (minimal compression) - .withCodecs(c -> c.withBlosc("lz4", "noshuffle", 1)) - // Balanced (good speed and compression) - .withCodecs(c -> c.withBlosc("zstd", "shuffle", 5)) - // Best compression (slower) - .withCodecs(c -> c.withZstd(22)) - ``` + ### Getting Help - **GitHub Issues**: [github.com/zarr-developers/zarr-java/issues](https://github.com/zarr-developers/zarr-java/issues) - **Zarr Community**: [zarr.dev](https://zarr.dev/)