-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Goal
Design a set of sharp, binary questions that:
- Minimize compliance overhead for honest actors
- Leave few places to hide for miscreants
- Are falsifiable and auditable
Design Principles
From the v0.1 sketch: "minimizes authoring friction, maximizes falsifiability/comparability"
What Makes a Question "Sharp"?
- Binary answerable - yes/no, not essays
- Falsifiable - can be checked against evidence
- Specific - no weasel room
- Low burden - answering honestly takes minutes, not hours
Anti-patterns to Avoid
- "Describe your limitations" → vague, self-serving
- "Have you considered bias?" → checkbox compliance
- "Is your model safe?" → undefined terms
Better Patterns
- "Does training data include content from domain X? (yes/no, if yes link filter spec)"
- "Has model output been evaluated on [specific benchmark] version [X]? (yes/no, if yes link results)"
- "Were red-team evaluations conducted for [risk domain]? (yes/no, if yes link methodology)"
Question Categories (from v0.1 MUST)
Identity & Lineage
- Is there a unique model identifier? (format: ...)
- Is the base model specified? (if fine-tune/derivative)
- Is training date/version documented?
Intended Use
- Are ≥3 concrete out-of-scope uses specified with rationale?
- Are deployment constraints documented?
Performance Claims
- For each claimed benchmark: is dataset version specified?
- For each claimed benchmark: is eval script commit linked?
- For each claimed benchmark: is run hash/seed documented?
Limitations & Failure Modes
- Is at least one "worse than baseline" context documented?
- Are worst-case behaviors that were tested documented?
Data Provenance
- Are data source classes documented?
- Are filtering criteria documented?
- Is a Data Card linked (if exists)?
Safety Testing
- For each risk domain (jailbreaks, autonomy, persuasion, cyber, bio): was it evaluated? (yes/no)
- If evaluated: is methodology linked?
Deliverables
prompts/must-questions.md- finalized MUST-level binary questionsprompts/should-questions.md- SHOULD-level questionsschema/prompts.schema.json- machine-readable question definitions with validation rules
Workshopping Process
This needs iteration. Propose:
- Draft initial question set
- Test against existing model cards (what would pass/fail?)
- Gather feedback from potential adopters
- Refine based on edge cases
Related Issues
- Evidence Linking Protocol (how answers link to artifacts)
- Risk-Tiered Adversarial Framing (escalation for high-risk)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels