Skip to content

Conversation

@lavanya3k
Copy link
Collaborator

Description of the code changes: During the pyQuARC run, we want to ensure that the collection ID retrieved from the CMR query uses the latest version of the schema (e.g., umm-c). There were instances in which multiple versions of the collection ID existed in the CMR, particularly CDDIS. With this code fix, pyQuARC runs for the latest or recent version of the collection.

Example: C1000000003-CDDIS (umm-c).

Expected output: Schema version is displayed at the top of the results

Screenshot 2025-10-02 at 12 54 42 PM

@lavanya3k lavanya3k changed the base branch from master to dev October 2, 2025 17:59
@FBayat021 FBayat021 requested review from FBayat021 and slesaad October 6, 2025 14:23
@FBayat021
Copy link

FBayat021 commented Oct 6, 2025

@lavanya3k I tested on the following Concept IDs:

C3173447237-NSIDC_CPRD
C3274606314-NSIDC_CPRD
C1977859380-GHRC_DAAC (umm-c native 1.18.3)

And it pulled the latest version in 1.18.4! Great work! Just to confirm, this does cross-validate against Schema version 1.18.4/latest version available? (It's not just adjusting the schema specification value to the latest, right?)

Also, can we fix the output message here so it says Collection instead of ollection?
image

Let @slesaad test this too!

@lavanya3k
Copy link
Collaborator Author

@fb0023 - Thanks for testing the PR. The above fix will retrieve the latest version of the collection ID (e.g., 1.18.4) and validate it against umm-c 1.18.4 in pyQuARC. I'm not sure if this addresses the CMR validation completely, which was requested by the ESDIS team. I am going to let @slesaad and you take over, and let me know if anything is missing with the request.
I will fix the typo with the 'Collection' shortly.

@lavanya3k
Copy link
Collaborator Author

lavanya3k commented Oct 7, 2025

Here is the screenshot after retesting. The typo did not occur while testing.
Included code changes in main.py for error_prompt, and updated to the latest version of lxml for Python 3.10+.

Screenshot 2025-10-07 at 2 19 13 PM

pyQuARC/main.py Outdated
Comment on lines 142 to 194
def _get_latest_version(self, concept_id):
"""
Fetches the latest revision version for a given concept_id from CMR
Args:
concept_id (str): The concept ID to query
Returns:
str: The latest revision number, or None if not found
"""
try:
# Construct the CMR metadata URL for the concept
url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json"
headers = get_headers()
response = requests.get(url, headers=headers)

if response.status_code == 200:
# Extract revision-id from response headers
revision_id = response.headers.get('CMR-Revision-Id')
return revision_id
else:
print(f"Warning: Could not fetch latest version for {concept_id}. Using default.")
return None
except Exception as e:
print(f"Error fetching latest version for {concept_id}: {str(e)}")
return None

def _get_collection_version(self, concept_id):
"""
Fetch the MetadataSpecification.Version of a collection from CMR.
Args:
concept_id (str): The concept ID to query.
Returns:
str: The collection's MetadataSpecification.Version, or None if not found.
"""
try:
url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json"
headers = get_headers()
response = requests.get(url, headers=headers)

if response.status_code == 200:
data = response.json()
# UMM collections have MetadataSpecification.Version
version = data.get("MetadataSpecification", {}).get("Version")
return version
else:
print(f"Warning: Could not fetch metadata for {concept_id}.")
return None
except Exception as e:
print(f"Error fetching collection version for {concept_id}: {str(e)}")
return None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two methods have a lot in common - can you DRY (dont repeat yourself) it? also not sure what's the difference between these two versions and it looks like you're just printing the collection_version, but not using it elsewhere, do you even need that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Please take a look at the updated code changes.

Comment on lines 185 to 207
@@ -194,24 +260,19 @@
)
continue
content = content.encode()
cmr_response = self._validate_with_cmr(concept_id, content)
validation_errors, pyquarc_errors = checker.run(content)
self.errors.append(
{
"concept_id": concept_id,
"errors": validation_errors,
"cmr_validation": {
"errors": cmr_response.json().get("errors", []),
# TODO: show warnings
"warnings": cmr_response.json().get("warnings", [])
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we remove cmr validation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest keeping the cmr_validation. Please feel free to edit the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a ticket that's asking for cmr validation - why remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

pyQuARC/main.py Outdated
Comment on lines 148 to 150
info_type (str): Type of information to fetch.
Options: "revision" or "metadata_version".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is implemented??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only the metadata version, and not the revision.

pyQuARC/main.py Outdated
Returns:
str: The collection's MetadataSpecification.Version, or None if not found.
str: The requested info (revision ID or MetadataSpecification.Version), or None if not found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
str: The requested info (revision ID or MetadataSpecification.Version), or None if not found.
dict: {"revision_id": str | None, "metadata_version": str | None } A dict of Revision ID and Metadata Version of the collection

pyQuARC/main.py Outdated
Comment on lines 159 to 171
if response.status_code != 200:
print(f"Warning: Could not fetch data for {concept_id}. Status: {response.status_code}")
return {"revision_id": None, "metadata_version": None}

data = response.json() if response.content else {}
return {
"revision_id": response.headers.get("CMR-Revision-Id"),
"metadata_version": data.get("MetadataSpecification", {}).get("Version"),
}

except Exception as e:
print(f"Error fetching collection version for {concept_id}: {str(e)}")
return None
# Unified error handling — return dict even on failure
print(f"Error fetching collection info for {concept_id}: {str(e)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite this method with something like this:

failure_return_value = {"revision_id": None, "metadata_version": None}
try:
    url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json"
    headers = get_headers()
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    data = response.json() if response.content else {}
    return {
        "revision_id": response.headers.get("CMR-Revision-Id"),
        "metadata_version": data.get("MetadataSpecification", {}).get("Version"),
    }

except Exception as e:
    # Unified error handling — return dict even on failure
    print(f"Error fetching collection info for {concept_id}: {str(e)}")
    return failure_return_value

pyQuARC/main.py Outdated
Comment on lines 223 to 226
if version_to_use:
print(f"Using latest revision {version_to_use} for {concept_id}")
if metadata_version:
print(f"Collection {concept_id} schema version: {metadata_version}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these print statements

@lavanya3k
Copy link
Collaborator Author

Tested in the local branch, and it works:

Expected output:
Screenshot 2025-10-16 at 3 32 55 PM

@lavanya3k lavanya3k merged commit fcba356 into dev Oct 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants