Skip to content

Improve inefficient retrieval of zarr chunks from the API #1777

@jjnesbitt

Description

@jjnesbitt

We are receiving many requests to the API, from the CLI, of the following format:

GET /api/zarr/<zarr_id>/files/?prefix=0/0/0/13/14/97&download=true

Often times, the path provided (0/0/0/13/14/97) is itself the only file returned, and as such, could just be retrieved by querying for the "level" above (0/0/0/13/14). Really though, every response from that endpoint is itself an object, so there should never be a case where the CLI is trying to determine if a path returned is a directory or a file.

I believe the code generating these requests lives here:

dandi-cli/dandi/dandiapi.py

Lines 1786 to 1795 in 953923a

def iterfiles(self, prefix: str | None = None) -> Iterator[RemoteZarrEntry]:
"""
Returns a generator of all `RemoteZarrEntry`\\s within the Zarr,
optionally limited to those whose path starts with the given prefix
"""
for r in self.client.paginate(
f"{self.client.api_url}/zarr/{self.zarr}/files", params={"prefix": prefix}
):
data = ZarrEntryServerData.model_validate(r)
yield RemoteZarrEntry.from_server_data(self, data)

It's possible the requests we're receiving are from a modified version of the CLI, in which case this issue can be closed (if we can truly determine that to be the case).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions