Skip to content

Design: Evidence Linking Protocol #6

@PipFoweraker

Description

@PipFoweraker

Goal

Specify how claims in interrogatory model cards must link to verifiable artifacts, making documentation auditable rather than self-reported.

Core Principle

From v0.1: "Performance claims → dataset version + eval script commit + run hash"

Every claim should trace to evidence. The protocol defines:

  1. What kinds of evidence are acceptable
  2. How links are formatted/validated
  3. When evidence must be public vs attestable

Evidence Types

Code Artifacts

  • Git commits (full SHA)
  • Git tags/releases
  • Container images (digest)
  • Package versions (pinned)

Data Artifacts

  • Dataset versions (HuggingFace revision, Kaggle version)
  • Data Card links
  • Croissant metadata files
  • DVC references

Execution Artifacts

  • Run hashes (W&B run ID, MLflow run ID)
  • Logs with timestamps
  • Reproducibility seeds

Third-Party Attestations

  • Audit reports (linked, dated)
  • Certification references (ISO 42001 cert number)
  • Red-team evaluation reports

Link Format Specification

{
  "claim": "Achieves 85% accuracy on MMLU",
  "evidence": [
    {
      "type": "benchmark_result",
      "dataset": {
        "name": "MMLU",
        "version": "1.0.0",
        "source": "https://huggingface.co/datasets/cais/mmlu"
      },
      "eval_script": {
        "repo": "https://github.com/org/eval-harness",
        "commit": "abc123def456..."
      },
      "run": {
        "id": "wandb://org/project/runs/xyz789",
        "seed": 42,
        "timestamp": "2026-01-15T10:30:00Z"
      }
    }
  ]
}

Validation Rules

Strict Mode (for high-risk)

  • All links must resolve
  • Git commits must be in public repos or attested private
  • Run artifacts must be retrievable or attested

Standard Mode

  • Links should resolve; warnings for broken
  • Private repos allowed with attestation statement
  • Run artifacts recommended but not required

Minimal Mode (for early-stage/research)

  • Links encouraged
  • Missing evidence flagged but not blocking

Open Questions

  1. How to handle proprietary/confidential evidence?
    • Attestation by third party?
    • Cryptographic commitments?
  2. Link rot: what happens when URLs die?
    • Archive.org / perma.cc recommendations?
    • Hash-based content addressing?
  3. Incremental disclosure: can evidence be added post-publication?

Deliverables

  1. schema/evidence-link.schema.json - JSON Schema for evidence objects
  2. docs/evidence-protocol.md - human-readable specification
  3. tools/link-validator.py - basic link checking tool

Related Issues

  • Schema: Croissant/schema.org Extension Design
  • Tooling: Validation & Generation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions