Skip to content

Conversation

@Joshdpaul
Copy link
Contributor

@Joshdpaul Joshdpaul commented Oct 22, 2025

This PR adds a dynamic indicators endpoint to the API. The endpoint pulls data from the recently developed downscaled CMIP6 coverages, and is based on preliminary work in this repo.

This PR is currently in draft mode, incomplete for the following reasons:

  • the coverages are still in flux, subject to change
  • there is a known conflict between the ansi timestamps in the coverage and the built-in WCPS query iteration over time (JP can explain more about this - BLUF, we might have to reingest the coverages with the timestamp at 0 hour instead of noon 🙃 )
  • we still need the lat/lon "in extent?" validation (via geotiff or bounding box)
  • the endpoint still needs CSV output
  • we need HTML documentation for this endpoint, which in turn requires an academic reference that does not yet exist (this one might take a while to iron out since there is USGS involvement...)

Background:

The goal here is to develop dynamically generated indicators that threshold, rank, or otherwise summarize daily CMIP6 data based on user inputs. The processing is largely done server-side, via WCPS queries. We are only implementing indicators that do not rely on sequence or consecutive day aggregations, because our downscaling methodology does not preserve the order of values.

Our list of possible indicators is therefore:

  • hd: “Hot day” threshold (degrees C) — the highest observed daily maximum 2 m air temperature such that there are 5 other observations equal to or greater than this value.
  • cd: “Cold day” threshold (degrees C) — the lowest observed daily minimum 2 m air temperature such that there are 5 other observations equal to or less than this value.
  • rx1day: Maximum 1–day precipitation (mm)
  • su: "Summer Days" — Annual number of days with maximum 2 m air temperature above 25 degrees C
  • dw: "Deep Winter days" — Annual number of days with minimum 2 m air temperature below -30 degrees C
  • r10mm: Number of days with precipitation > 10 mm
  • wet days per year: Number of days with precipitation > 1 mm

However, when coding this endpoint it became clear that we have 3 general types of queries that could actually be applied to any variable:

  • Threshold queries: counting days per year above or below a certain value
  • Statistical queries: getting min, mean, or max value per year
  • Ranking queries: finding the nth highest or lowest value per year

So rather than code up specific functions for the existing indicators, I created three granular functions that can be applied in order to recreate those indicators, or develop new ones. My reasoning for this is that not only does it require less code, but it takes into account that allowing user input will necessarily change the definitions of the indicators. For example, if you use 0C as the temperature threshold for the "deep winter days" indicator, are you really talking about "deep winter" anymore? Say you wanted to know how many days were above freezing (to approximate a growing season, perhaps) - the term "summer days" is not a very accurate description.

Note that each function can be used for any coverage variable, allows the use of both metric and imperial units, and provides time slicing options (yearly). It's an open question whether or not we need all this functionality. Note also that the endpoint URL construction is unique when compared to our other endpoints. I think there is something nice about being able to "read" the request URL sort of like a sentence, that describes exactly what you are asking for - but again, this approach is open to discussion, and we may want to make the endpoint less granular.

To test:

Start the API as usual, and test out some of the following endpoints. Try swapping "in" for "mm", or "F" for "C" and check out the results. Or try making up a completely new indicator!

Threshold queries

Statistical queries

Ranking queries

@Joshdpaul
Copy link
Contributor Author

Update:

Added documentation.

Added area queries to this endpoint. Note that these endpoints do not use the fancy WCPS queries that perform the server-side processing of threshold or statistics functions: that seems to be impossible to iterate over years AND grid cells in the same query.

Instead, we simply query for a NetCDF of the daily data in a bbox defined by the polygon, perform zonal statistics to get an area mean for each day, then compute the indicators from those means. Even so, the processing is fairly quick for something like a HUC10.

http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/area/1908030609/2000/2030/

http://127.0.0.1:5000/dynamic_indicators/stat/max/pr/mm/area/1908030609/2000/2030

http://127.0.0.1:5000/dynamic_indicators/rank/6/lowest/tasmin/area/1908030609/2000/2030/

@Joshdpaul Joshdpaul marked this pull request as ready for review November 7, 2025 19:35
Copy link
Member

@charparr charparr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is an awesome use of Rasdaman via the API. Substantial and significant work here Josh, nice job. I've left quite a few comments, but before your proceed in addressing them I'm going to lay out a few broader points for discussion, knowing that some of this will likely get sorted while I'm gone on leave!

Structural

  • I'm not convinced that dynamic indicators is an endpoint (i.e. a resource) rather than a view / filter of a resource. I'm alerted here because the backend data is the same (well, a subset) of the CMIP6 downscaled endpoint. There are probably some exceptions we can find, but the general model of the API thus far is that the endpoints (paths) correspond to discrete data resources (usually Rasdaman coverages) and then we sort and filter them in various ways (sometimes also with paths, like /point/lat/lon/variable/ or with query parameters). The path vs. query parameter is sort of a toss-up in my mind, but I do think that in this case dynamic indicators (filtering on some value, n) would be better considered a view into a resource or resources because...
  • We can apply this incredible WCPS-fu over multiple endpoints! The most obvious one would be the ERA5-WRF data. This is why I left a comment suggesting that the dynamic indicator logic might perhaps be better off in a new module where multiple endpoints can talk to it. I could imagine sea ice, fire weather, dynamical CMIP6 in the future, all implementing this. Dunking this somewhere else would also...
  • Reduce the code footprint. There is a ton of similarity between this endpoint and the CMIP6 downscaled endpoint, which makes sense because they have the same data resources, so the configuration, packaging, validation, postprocessing, etc. will have some degree of similarity.

Request Validation

Request validation needs major revision. Some of this is likely broader than this PR ( XREF #616 ), but there are too many layers here handling the same logic: the Marshmallow QueryParamsScheme in application.py, the imports from validate_request, and the route-specific validation bits here in this PR. Additionally, there are several places where the implementation of the request validation results in rules not being enforced. Here are some examples:

lat-lon guardrails get ignored

latlon_is_numeric_and_in_geodetic_range returns either True or the int 400, but validate_latlon_and_reproject_to_epsg_3338 checks it with a if not syntax but since 400 is "truthy", non-geodetic coordinates are treated as valid and the request proceeds until the downstream WCPS call fails. See this with http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/point/1000/-147.5/2000/2030/

operator validation

http://127.0.0.1:5000/dynamic_indicators/count_days/foobar/25/C/tasmax/point/65/-147.5/2000/2030/

should return a 400, instead we get a 500. I think this happens because the result of the validation function gets assigned to the operator variable! Operator is then the (template, 400) tuple. I think this same pattern happens with...

place IDs bubble up to 500s

http://127.0.0.1:5000/dynamic_indicators/count_days/above/25/C/tasmax/area/FOOFOOBARBAR/2000/2030/

This 500s, but should cause a 422 “invalid area”. And I think the root cause is similar to the previous bug.

year range stuff not enforced

validate_year returns True or 400, but nothing gets done with that return value. Empty dicts get returned
http://127.0.0.1:5000/dynamic_indicators/stat/max/pr/mm/point/64/-147/20520/2000/
http://127.0.0.1:5000/dynamic_indicators/stat/max/pr/mm/point/64/-147/2050/2000/

So, there is quite a bit of validation code in this route, but a significant portion of it does not work as intended. Adding tests for know invalid requests may help develop this.


Alright all that being said, before diving in and making changes I'd suggest getting more input from @brucecrevensten and others on the direction here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably an annoying comment, so will ask forgiveness ahead of time, but:

Should the dynamic CMIP6 indicators be wrapped into the existing CMIP6 downscaled route? The reason I ask is because these are two endpoints are fundamentally querying the same underlying "resource" : the downscaled CMIP6 coverages. That'd fit the conventional model in which paths map to specific resources and then query params do the heavy lifting. This might feel like shuffling deck chairs on the Titanic, I dunno, but there is also an opportunity to re-use some of the validation and packaging code, etc, and then pus the indicator logic to a different module which could get then touch other resources (ERA5-WRF, future dynamically downscaled stuff) and those will all have their own validation and packaging that is different from what is in this endpoint.

@charparr
Copy link
Member

@Joshdpaul I like all of these changes! I'm probably not going to be able to dive much further into this before I leave, but am stoked to see where this goes, especially in the context of a potential Garden Helper reboot. I think you and Bruce will figure out the right path forward, but I do feel pretty strongly that we're going to want to be able to access different resources with this logic: namely, CMIP6-Downscaled, and the ERA5-WRF. Two data peas in a pod 🫛 for something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants