-
Notifications
You must be signed in to change notification settings - Fork 3
Cmr latest version #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cmr latest version #355
Conversation
|
@lavanya3k I tested on the following Concept IDs: C3173447237-NSIDC_CPRD And it pulled the latest version in 1.18.4! Great work! Just to confirm, this does cross-validate against Schema version 1.18.4/latest version available? (It's not just adjusting the schema specification value to the latest, right?) Also, can we fix the output message here so it says Let @slesaad test this too! |
|
@fb0023 - Thanks for testing the PR. The above fix will retrieve the latest version of the collection ID (e.g., 1.18.4) and validate it against umm-c 1.18.4 in pyQuARC. I'm not sure if this addresses the CMR validation completely, which was requested by the ESDIS team. I am going to let @slesaad and you take over, and let me know if anything is missing with the request. |
pyQuARC/main.py
Outdated
| def _get_latest_version(self, concept_id): | ||
| """ | ||
| Fetches the latest revision version for a given concept_id from CMR | ||
| Args: | ||
| concept_id (str): The concept ID to query | ||
| Returns: | ||
| str: The latest revision number, or None if not found | ||
| """ | ||
| try: | ||
| # Construct the CMR metadata URL for the concept | ||
| url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json" | ||
| headers = get_headers() | ||
| response = requests.get(url, headers=headers) | ||
|
|
||
| if response.status_code == 200: | ||
| # Extract revision-id from response headers | ||
| revision_id = response.headers.get('CMR-Revision-Id') | ||
| return revision_id | ||
| else: | ||
| print(f"Warning: Could not fetch latest version for {concept_id}. Using default.") | ||
| return None | ||
| except Exception as e: | ||
| print(f"Error fetching latest version for {concept_id}: {str(e)}") | ||
| return None | ||
|
|
||
| def _get_collection_version(self, concept_id): | ||
| """ | ||
| Fetch the MetadataSpecification.Version of a collection from CMR. | ||
| Args: | ||
| concept_id (str): The concept ID to query. | ||
| Returns: | ||
| str: The collection's MetadataSpecification.Version, or None if not found. | ||
| """ | ||
| try: | ||
| url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json" | ||
| headers = get_headers() | ||
| response = requests.get(url, headers=headers) | ||
|
|
||
| if response.status_code == 200: | ||
| data = response.json() | ||
| # UMM collections have MetadataSpecification.Version | ||
| version = data.get("MetadataSpecification", {}).get("Version") | ||
| return version | ||
| else: | ||
| print(f"Warning: Could not fetch metadata for {concept_id}.") | ||
| return None | ||
| except Exception as e: | ||
| print(f"Error fetching collection version for {concept_id}: {str(e)}") | ||
| return None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two methods have a lot in common - can you DRY (dont repeat yourself) it? also not sure what's the difference between these two versions and it looks like you're just printing the collection_version, but not using it elsewhere, do you even need that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Please take a look at the updated code changes.
| @@ -194,24 +260,19 @@ | |||
| ) | |||
| continue | |||
| content = content.encode() | |||
| cmr_response = self._validate_with_cmr(concept_id, content) | |||
| validation_errors, pyquarc_errors = checker.run(content) | |||
| self.errors.append( | |||
| { | |||
| "concept_id": concept_id, | |||
| "errors": validation_errors, | |||
| "cmr_validation": { | |||
| "errors": cmr_response.json().get("errors", []), | |||
| # TODO: show warnings | |||
| "warnings": cmr_response.json().get("warnings", []) | |||
| }, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we remove cmr validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you suggest keeping the cmr_validation. Please feel free to edit the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a ticket that's asking for cmr validation - why remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
pyQuARC/main.py
Outdated
| info_type (str): Type of information to fetch. | ||
| Options: "revision" or "metadata_version". | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this is implemented??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only the metadata version, and not the revision.
pyQuARC/main.py
Outdated
| Returns: | ||
| str: The collection's MetadataSpecification.Version, or None if not found. | ||
| str: The requested info (revision ID or MetadataSpecification.Version), or None if not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| str: The requested info (revision ID or MetadataSpecification.Version), or None if not found. | |
| dict: {"revision_id": str | None, "metadata_version": str | None } A dict of Revision ID and Metadata Version of the collection |
pyQuARC/main.py
Outdated
| if response.status_code != 200: | ||
| print(f"Warning: Could not fetch data for {concept_id}. Status: {response.status_code}") | ||
| return {"revision_id": None, "metadata_version": None} | ||
|
|
||
| data = response.json() if response.content else {} | ||
| return { | ||
| "revision_id": response.headers.get("CMR-Revision-Id"), | ||
| "metadata_version": data.get("MetadataSpecification", {}).get("Version"), | ||
| } | ||
|
|
||
| except Exception as e: | ||
| print(f"Error fetching collection version for {concept_id}: {str(e)}") | ||
| return None | ||
| # Unified error handling — return dict even on failure | ||
| print(f"Error fetching collection info for {concept_id}: {str(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite this method with something like this:
failure_return_value = {"revision_id": None, "metadata_version": None}
try:
url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json"
headers = get_headers()
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json() if response.content else {}
return {
"revision_id": response.headers.get("CMR-Revision-Id"),
"metadata_version": data.get("MetadataSpecification", {}).get("Version"),
}
except Exception as e:
# Unified error handling — return dict even on failure
print(f"Error fetching collection info for {concept_id}: {str(e)}")
return failure_return_value
pyQuARC/main.py
Outdated
| if version_to_use: | ||
| print(f"Using latest revision {version_to_use} for {concept_id}") | ||
| if metadata_version: | ||
| print(f"Collection {concept_id} schema version: {metadata_version}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove these print statements



Description of the code changes: During the pyQuARC run, we want to ensure that the collection ID retrieved from the CMR query uses the latest version of the schema (e.g., umm-c). There were instances in which multiple versions of the collection ID existed in the CMR, particularly CDDIS. With this code fix, pyQuARC runs for the latest or recent version of the collection.
Example: C1000000003-CDDIS (umm-c).
Expected output: Schema version is displayed at the top of the results