Ideas for loading GRIB data as fast as possible

Are we confident that it's even possible to go faster than kerchunk when reading a petabyte-scale GRIB dataset on cloud object storage?! (If not, then there's not much point in hypergrib existing!)

In particular, when reading GRIB files, do we think kerchunk would saturate a 200 Gbps network connection on a VM connected in the same cloud region as the GRIB data? (Saturating a 200 Gbps NIC probably requires a few hundred GET requests to be in flight at any moment). My understanding is that Zarr-Python version 2 (without David's joblib patch to Zarr) definitely wouldn't saturate a 200 Gbps NIC. But maybe Kerchunk combined with Zarr-Python version 3, and/or Zarr-Python v2 with David's patch, would saturate a 200 Gbps NIC?

And, in terms of latency, how long would it take kerchunk to figure out which GRIB to read, if kerchunk has to look through a huge manifest (let's say 20 years of GRIBs)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for loading GRIB data as fast as possible #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ideas for loading GRIB data as fast as possible #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions