-
Notifications
You must be signed in to change notification settings - Fork 1
Dynamic Indicators from Downscaled CMIP6 #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ple returns to documentation
|
Update: Added documentation. Added area queries to this endpoint. Note that these endpoints do not use the fancy WCPS queries that perform the server-side processing of threshold or statistics functions: that seems to be impossible to iterate over years AND grid cells in the same query. Instead, we simply query for a NetCDF of the daily data in a bbox defined by the polygon, perform zonal statistics to get an area mean for each day, then compute the indicators from those means. Even so, the processing is fairly quick for something like a HUC10. http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/area/1908030609/2000/2030/ http://127.0.0.1:5000/dynamic_indicators/stat/max/pr/mm/area/1908030609/2000/2030 http://127.0.0.1:5000/dynamic_indicators/rank/6/lowest/tasmin/area/1908030609/2000/2030/ |
charparr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is an awesome use of Rasdaman via the API. Substantial and significant work here Josh, nice job. I've left quite a few comments, but before your proceed in addressing them I'm going to lay out a few broader points for discussion, knowing that some of this will likely get sorted while I'm gone on leave!
Structural
- I'm not convinced that dynamic indicators is an endpoint (i.e. a resource) rather than a view / filter of a resource. I'm alerted here because the backend data is the same (well, a subset) of the CMIP6 downscaled endpoint. There are probably some exceptions we can find, but the general model of the API thus far is that the endpoints (paths) correspond to discrete data resources (usually Rasdaman coverages) and then we sort and filter them in various ways (sometimes also with paths, like
/point/lat/lon/variable/or with query parameters). The path vs. query parameter is sort of a toss-up in my mind, but I do think that in this case dynamic indicators (filtering on some value, n) would be better considered a view into a resource or resources because... - We can apply this incredible WCPS-fu over multiple endpoints! The most obvious one would be the ERA5-WRF data. This is why I left a comment suggesting that the dynamic indicator logic might perhaps be better off in a new module where multiple endpoints can talk to it. I could imagine sea ice, fire weather, dynamical CMIP6 in the future, all implementing this. Dunking this somewhere else would also...
- Reduce the code footprint. There is a ton of similarity between this endpoint and the CMIP6 downscaled endpoint, which makes sense because they have the same data resources, so the configuration, packaging, validation, postprocessing, etc. will have some degree of similarity.
Request Validation
Request validation needs major revision. Some of this is likely broader than this PR ( XREF #616 ), but there are too many layers here handling the same logic: the Marshmallow QueryParamsScheme in application.py, the imports from validate_request, and the route-specific validation bits here in this PR. Additionally, there are several places where the implementation of the request validation results in rules not being enforced. Here are some examples:
lat-lon guardrails get ignored
latlon_is_numeric_and_in_geodetic_range returns either True or the int 400, but validate_latlon_and_reproject_to_epsg_3338 checks it with a if not syntax but since 400 is "truthy", non-geodetic coordinates are treated as valid and the request proceeds until the downstream WCPS call fails. See this with http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/point/1000/-147.5/2000/2030/
operator validation
http://127.0.0.1:5000/dynamic_indicators/count_days/foobar/25/C/tasmax/point/65/-147.5/2000/2030/
should return a 400, instead we get a 500. I think this happens because the result of the validation function gets assigned to the operator variable! Operator is then the (template, 400) tuple. I think this same pattern happens with...
place IDs bubble up to 500s
http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/area/FOOFOOBARBAR/2000/2030/
This 500s, but should cause a 422 “invalid area”. And I think the root cause is similar to the previous bug.
year range stuff not enforced
validate_year returns True or 400, but nothing gets done with that return value. Empty dicts get returned
http://127.0.0.1:5000/dynamic_indicators/stat/max/pr/mm/point/64/-147/20520/2000/
http://127.0.0.1:5000/dynamic_indicators/stat/max/pr/mm/point/64/-147/2050/2000/
So, there is quite a bit of validation code in this route, but a significant portion of it does not work as intended. Adding tests for know invalid requests may help develop this.
Alright all that being said, before diving in and making changes I'd suggest getting more input from @brucecrevensten and others on the direction here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably an annoying comment, so will ask forgiveness ahead of time, but:
Should the dynamic CMIP6 indicators be wrapped into the existing CMIP6 downscaled route? The reason I ask is because these are two endpoints are fundamentally querying the same underlying "resource" : the downscaled CMIP6 coverages. That'd fit the conventional model in which paths map to specific resources and then query params do the heavy lifting. This might feel like shuffling deck chairs on the Titanic, I dunno, but there is also an opportunity to re-use some of the validation and packaging code, etc, and then pus the indicator logic to a different module which could get then touch other resources (ERA5-WRF, future dynamically downscaled stuff) and those will all have their own validation and packaging that is different from what is in this endpoint.
|
@Joshdpaul I like all of these changes! I'm probably not going to be able to dive much further into this before I leave, but am stoked to see where this goes, especially in the context of a potential Garden Helper reboot. I think you and Bruce will figure out the right path forward, but I do feel pretty strongly that we're going to want to be able to access different resources with this logic: namely, CMIP6-Downscaled, and the ERA5-WRF. Two data peas in a pod 🫛 for something like this. |
This PR adds a dynamic indicators endpoint to the API. The endpoint pulls data from the recently developed downscaled CMIP6 coverages, and is based on preliminary work in this repo.
This PR is currently in draft mode, incomplete for the following reasons:
ansitimestamps in the coverage and the built-in WCPS query iteration over time (JP can explain more about this - BLUF, we might have to reingest the coverages with the timestamp at 0 hour instead of noon 🙃 )Background:
The goal here is to develop dynamically generated indicators that threshold, rank, or otherwise summarize daily CMIP6 data based on user inputs. The processing is largely done server-side, via WCPS queries. We are only implementing indicators that do not rely on sequence or consecutive day aggregations, because our downscaling methodology does not preserve the order of values.
Our list of possible indicators is therefore:
However, when coding this endpoint it became clear that we have 3 general types of queries that could actually be applied to any variable:
So rather than code up specific functions for the existing indicators, I created three granular functions that can be applied in order to recreate those indicators, or develop new ones. My reasoning for this is that not only does it require less code, but it takes into account that allowing user input will necessarily change the definitions of the indicators. For example, if you use 0C as the temperature threshold for the "deep winter days" indicator, are you really talking about "deep winter" anymore? Say you wanted to know how many days were above freezing (to approximate a growing season, perhaps) - the term "summer days" is not a very accurate description.
Note that each function can be used for any coverage variable, allows the use of both metric and imperial units, and provides time slicing options (yearly). It's an open question whether or not we need all this functionality. Note also that the endpoint URL construction is unique when compared to our other endpoints. I think there is something nice about being able to "read" the request URL sort of like a sentence, that describes exactly what you are asking for - but again, this approach is open to discussion, and we may want to make the endpoint less granular.
To test:
Start the API as usual, and test out some of the following endpoints. Try swapping "in" for "mm", or "F" for "C" and check out the results. Or try making up a completely new indicator!
Threshold queries
the "summer days" indicator: http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/64.5/-147.5/2000/2030/
the "deep winter days" indicator: http://127.0.0.1:5000/dynamic_indicators/count_days/below/-30/C/tasmin/64.5/-147.5/2000/2030/
the "days above 10mm precip" indicator: http://127.0.0.1:5000/dynamic_indicators/count_days/above/10/mm/pr/64.5/-147.5/2000/2030/
the "wet days" indicator: http://127.0.0.1:5000/dynamic_indicators/count_days/above/1/mm/pr/64.5/-147.5/2000/2030/
Statistical queries
Ranking queries