-
Notifications
You must be signed in to change notification settings - Fork 89
HRRR grib read and memory leak fix #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR switches HRRR data source from xarray/cfgrib to pygrib for GRIB file reading, delivering a 10x performance improvement and fixing a memory leak. The change is well-tested with visual validation showing identical output between old and new implementations.
Key Changes:
- Replaced
xr.open_dataarray()withpygrib.open()infetch_array()method - Added pygrib to data dependency group (MIT licensed)
- Uses pygrib's 1-based indexing (
grbs[1]) which is correct for accessing the first GRIB message - Properly closes pygrib file handle in finally block to prevent resource leaks
- Updated CHANGELOG appropriately
Minor Improvements Suggested:
- The FutureWarning suppression for cfgrib (lines 61-62) is now obsolete for HRRR and could be removed
- Exception handling could be more specific than catching bare
Exception - Consider adding version constraint to pygrib dependency
Confidence Score: 4/5
- This PR is safe to merge with minor style improvements recommended
- The core logic is sound - pygrib integration is correct with proper resource cleanup, and the author validated output matches the previous implementation. The byte-range-per-message assumption is valid given the index parsing logic. Minor deductions for: (1) obsolete cfgrib warning suppression, (2) bare exception handling, and (3) missing version constraint on new dependency. These are non-critical style issues that don't affect functionality.
- No files require special attention - all changes are straightforward
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| earth2studio/data/hrrr.py | 4/5 | Switched from xarray/cfgrib to pygrib for 10x faster GRIB reading and memory leak fix. Minor concerns with error handling specificity and obsolete warning suppression. |
| pyproject.toml | 5/5 | Added pygrib dependency to data group without version constraint. |
| CHANGELOG.md | 5/5 | Properly documented changes and new dependency following changelog conventions. |
Additional Comments (1)
|
|
/blossom-ci |
Earth2Studio Pull Request
Description
Switches the xarray cfgrib reads to pygrib, which is 10x faster and also eliminates a memory leak.
Here's the code I tested to make sure its the same:
Old:

New:

Add pygrib into the data dep group, MIT license:
https://github.com/jswhit/pygrib
Checklist
Dependencies