Skip to content

FileNotFoundError for .zmetadata with MultiZarrToZarr on non-consolidated Zarr sources #553

@marstonsward

Description

@marstonsward

When using kerchunk.combine.MultiZarrToZarr to create a combined reference dataset from multiple source Zarr stores that do not have consolidated metadata (i.e., no .zmetadata file exists in the source stores), the kerchunk.combine.MultiZarrToZarr() method fails with a FileNotFoundError, indicating it cannot find the .zmetadata file in the first source Zarr store it processes.

Environment:

kerchunk version: 0.2.8
fsspec version: 2025.2.0
zarr version: 3.0.6
xarray version: 2024.11.0
Python version: 3.11.6
Operating System: Linux Jupyter AWS
Filesystem: EFS mount

Expected Behavior: Based on the fix implemented in PR #50 for kerchunk.zarr.single_zarr (addressing Issue #49), it was expected that MultiZarrToZarr would also gracefully handle the absence of .zmetadata in source Zarr stores by falling back to reading individual metadata files (.zgroup, .zarray, .zattrs). The kerchunk.combine.MultiZarrToZarr() method should successfully generate the combined reference dictionary.

Actual Behavior: The mzz.translate() call fails with a FileNotFoundError traceback, indicating it attempted to open the non-existent .zmetadata file within the first source Zarr store. The process does not appear to fall back to reading individual metadata components in this multi-file scenario.

# MultiZarrToZarr
mzz = kerchunk.combine.MultiZarrToZarr(
            input_file_list, # Use the list directly
            concat_dims=[concat_dimension],
            identical_dims=identical_dimensions,
            remote_protocol=None, # Data is local (EFS)
            remote_options=None,  # Data is local (EFS)
            consolidated=False
        )

Output snippet:

kerchunk_create:kerchunk_create:37 - Received 4488 input Zarr stores.
kerchunk_create:kerchunk_create:38 - First few files: 
['/export/sml-data-lake/datastash/47/gfs/gfs_2024_01_01_t00z_f000.zarr', 
'/export/sml-data-lake/datastash/47/gfs/gfs_2024_01_01_t00z_f001.zarr', 
'/export/sml-data-lake/datastash/47/gfs/gfs_2024_01_01_t00z_f002.zarr']
kerchunk_create:kerchunk_create:42 - Generating Kerchunk index for concatenation along 'time'...
kerchunk_create:kerchunk_create:43 - Identical dimensions: ['latitude', 'longitude']
kerchunk_create:kerchunk_create:98 - An error occurred during Kerchunk processing: [Errno 2] No such file or 
directory: '/export/sml-data-lake/datastash/47/gfs/gfs_2024_01_01_t00z_f000.zarr/.zmetadata' 

Reference to PR #50: This issue seems related to the problem solved by PR #50 (#50) for single_zarr, but the fix does not appear to apply or work correctly within the MultiZarrToZarr workflow. Attempts to pass arguments like zarr_kwargs={'consolidated': False} to MultiZarrToZarr resulted in a TypeError as the argument is not supported by its constructor. Minimal Reproducible Example:(Refer to the code provided in the kerchunk_bug_report_code artifact above)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions