-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Goal
Specify how claims in interrogatory model cards must link to verifiable artifacts, making documentation auditable rather than self-reported.
Core Principle
From v0.1: "Performance claims → dataset version + eval script commit + run hash"
Every claim should trace to evidence. The protocol defines:
- What kinds of evidence are acceptable
- How links are formatted/validated
- When evidence must be public vs attestable
Evidence Types
Code Artifacts
- Git commits (full SHA)
- Git tags/releases
- Container images (digest)
- Package versions (pinned)
Data Artifacts
- Dataset versions (HuggingFace revision, Kaggle version)
- Data Card links
- Croissant metadata files
- DVC references
Execution Artifacts
- Run hashes (W&B run ID, MLflow run ID)
- Logs with timestamps
- Reproducibility seeds
Third-Party Attestations
- Audit reports (linked, dated)
- Certification references (ISO 42001 cert number)
- Red-team evaluation reports
Link Format Specification
{
"claim": "Achieves 85% accuracy on MMLU",
"evidence": [
{
"type": "benchmark_result",
"dataset": {
"name": "MMLU",
"version": "1.0.0",
"source": "https://huggingface.co/datasets/cais/mmlu"
},
"eval_script": {
"repo": "https://github.com/org/eval-harness",
"commit": "abc123def456..."
},
"run": {
"id": "wandb://org/project/runs/xyz789",
"seed": 42,
"timestamp": "2026-01-15T10:30:00Z"
}
}
]
}Validation Rules
Strict Mode (for high-risk)
- All links must resolve
- Git commits must be in public repos or attested private
- Run artifacts must be retrievable or attested
Standard Mode
- Links should resolve; warnings for broken
- Private repos allowed with attestation statement
- Run artifacts recommended but not required
Minimal Mode (for early-stage/research)
- Links encouraged
- Missing evidence flagged but not blocking
Open Questions
- How to handle proprietary/confidential evidence?
- Attestation by third party?
- Cryptographic commitments?
- Link rot: what happens when URLs die?
- Archive.org / perma.cc recommendations?
- Hash-based content addressing?
- Incremental disclosure: can evidence be added post-publication?
Deliverables
schema/evidence-link.schema.json- JSON Schema for evidence objectsdocs/evidence-protocol.md- human-readable specificationtools/link-validator.py- basic link checking tool
Related Issues
- Schema: Croissant/schema.org Extension Design
- Tooling: Validation & Generation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels