-
Notifications
You must be signed in to change notification settings - Fork 3
GEDI Retrieval Code #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This program can be used to download GEDI files using an text file of urls provided by EarthData Search. Combined with the gediFinder.py program, one should be able to gather data from a large bounding box using Earthdata search without a lot of computer storage space and in a timely manner.
crich011
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a handful of comments, one structural one about the order of downloading vs. checking output format but mostly about docs.
I think this all looks good and after you take a look at those, along with adding trailing spaces for some of the argparse help strings (see, e.g. L355 of process_l2a, which should get a space at the end of that string), and make any modifications to docstrings, I think we should merge it and get started on writing tests to port it to the earthshot repo.
| a = np.sort(a, axis =1) | ||
| count = (~np.isnan(a)).sum(axis=1) # count number of non-nans in row | ||
| groups = np.unique(count) # returns sorted unique values | ||
| groups = groups[groups > 0] # only returns groups with at least 1 non-nan value\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| groups = groups[groups > 0] # only returns groups with at least 1 non-nan value\n", | |
| groups = groups[groups > 0] # only returns groups with at least 1 non-nan value |
| Parameters | ||
| ---------- | ||
| file_dir : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring here can be updated to remove file_dir kwarg since the function just takes full paths, instead of a path and filenames, now.
| if args.filetype.lower() == "csv": | ||
| filename = os.path.join(args.dir, args.outfile + ".csv") | ||
| print(f'Writing to file {filename}') | ||
| df.to_csv(filename, index=False) | ||
| elif args.filetype.lower() == "parquet": | ||
| filename = os.path.join(args.dir, args.outfile + ".parquet.gzip") | ||
| print(f'Writing to file {filename}') | ||
| df.to_parquet(filename, compression="gzip") | ||
| elif args.filetype.lower() == "geojson": | ||
| filename = os.path.join(args.dir, args.outfile + ".geojson") | ||
| print(f'Writing to file {filename}') | ||
| df_to_geojson(df, filename) | ||
| else: | ||
| raise ValueError( | ||
| f"Received unsupported file type {args.filetype}. Please provide one of: csv, parquet, or GeoJSON." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to do the file extension handling before the downloading and unpacking, just in case a user provides the wrong file extension (or has a typo), to save the risk of losing that downloading and unpacking. Another option would be to store the output of the intermediate steps, but I think just checking up-front probably makes more sense than that to me.
| None | ||
| """ | ||
| filepath = os.path.join(dir, "granuleData.zip") | ||
| r = requests.get(url, stream = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way this does the checking and prints a descriptive error to the user is useful, especially since we want this function to return a bool, rather than throw the exception outright.
But, for future reference, a helpful tip that someone taught me: in most cases, using raise_for_status (i.e. r.raise_for_status() in this case) does the sensible thing you'd want when checking a response. It raises an error if the request returns an error or times out, and it returns none otherwise. Again, I think that is not actually what we want here, but it's worth having that in your toolbox for the times where it is what you want to do.
| Downloads and unpacks the zip file from the provide url | ||
| Returns | ||
| ------- | ||
| None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| None | |
| bool | |
| True indicates a successful download. False indicates that the download was unsuccessful. |
No description provided.