Skip to content

Conversation

@NickGeneva
Copy link
Collaborator

@NickGeneva NickGeneva commented Jan 8, 2026

Earth2Studio Pull Request

Description

Switches the xarray cfgrib reads to pygrib, which is 10x faster and also eliminates a memory leak.

Here's the code I tested to make sure its the same:

Old:
hrrr_2024-01-01T000000 000000000

New:
hrrr_2024-01-01T00_new

from datetime import datetime
from earth2studio.data import HRRR
import matplotlib.pyplot as plt
import numpy as np


ds = HRRR(verbose=True, cache=False)

da = ds(datetime(2024, 1, 1), ["t2m", "q2m", "z500"])

# Plot all requested variables at the first time in the same figure
time_val = da.coords["time"].values[0]
time_str = str(time_val).replace(":", "").replace(" ", "_")

vars_list = [str(v) for v in da.coords["variable"].values]
nvar = len(vars_list)
fig, axs = plt.subplots(1, nvar, figsize=(6 * nvar, 5))
if nvar == 1:
    axs = [axs]

for i, var in enumerate(vars_list):
    arr = da.sel(time=time_val, variable=var).values
    vmin, vmax = np.nanpercentile(arr, (2, 98))
    im = axs[i].imshow(arr, origin="lower", cmap="turbo", vmin=vmin, vmax=vmax)
    axs[i].set_title(f"{var}")
    fig.colorbar(im, ax=axs[i], fraction=0.046, pad=0.04)

fig.suptitle(f"HRRR fields at {str(time_val)}")
fig.tight_layout()
fig.savefig(f"hrrr_{time_str}.png", dpi=150)
plt.close(fig)

Add pygrib into the data dep group, MIT license:
https://github.com/jswhit/pygrib

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR switches HRRR data source from xarray/cfgrib to pygrib for GRIB file reading, delivering a 10x performance improvement and fixing a memory leak. The change is well-tested with visual validation showing identical output between old and new implementations.

Key Changes:

  • Replaced xr.open_dataarray() with pygrib.open() in fetch_array() method
  • Added pygrib to data dependency group (MIT licensed)
  • Uses pygrib's 1-based indexing (grbs[1]) which is correct for accessing the first GRIB message
  • Properly closes pygrib file handle in finally block to prevent resource leaks
  • Updated CHANGELOG appropriately

Minor Improvements Suggested:

  • The FutureWarning suppression for cfgrib (lines 61-62) is now obsolete for HRRR and could be removed
  • Exception handling could be more specific than catching bare Exception
  • Consider adding version constraint to pygrib dependency

Confidence Score: 4/5

  • This PR is safe to merge with minor style improvements recommended
  • The core logic is sound - pygrib integration is correct with proper resource cleanup, and the author validated output matches the previous implementation. The byte-range-per-message assumption is valid given the index parsing logic. Minor deductions for: (1) obsolete cfgrib warning suppression, (2) bare exception handling, and (3) missing version constraint on new dependency. These are non-critical style issues that don't affect functionality.
  • No files require special attention - all changes are straightforward

Important Files Changed

File Analysis

Filename Score Overview
earth2studio/data/hrrr.py 4/5 Switched from xarray/cfgrib to pygrib for 10x faster GRIB reading and memory leak fix. Minor concerns with error handling specificity and obsolete warning suppression.
pyproject.toml 5/5 Added pygrib dependency to data group without version constraint.
CHANGELOG.md 5/5 Properly documented changes and new dependency following changelog conventions.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 8, 2026

Additional Comments (1)

earth2studio/data/hrrr.py
This comment and warning suppression are now obsolete since HRRR no longer uses cfgrib (switched to pygrib). Consider removing unless cfgrib warnings affect other parts of the file execution.

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant