Skip to content

Design: Sharp Yes/No Questions Specification #5

@PipFoweraker

Description

@PipFoweraker

Goal

Design a set of sharp, binary questions that:

  • Minimize compliance overhead for honest actors
  • Leave few places to hide for miscreants
  • Are falsifiable and auditable

Design Principles

From the v0.1 sketch: "minimizes authoring friction, maximizes falsifiability/comparability"

What Makes a Question "Sharp"?

  1. Binary answerable - yes/no, not essays
  2. Falsifiable - can be checked against evidence
  3. Specific - no weasel room
  4. Low burden - answering honestly takes minutes, not hours

Anti-patterns to Avoid

  • "Describe your limitations" → vague, self-serving
  • "Have you considered bias?" → checkbox compliance
  • "Is your model safe?" → undefined terms

Better Patterns

  • "Does training data include content from domain X? (yes/no, if yes link filter spec)"
  • "Has model output been evaluated on [specific benchmark] version [X]? (yes/no, if yes link results)"
  • "Were red-team evaluations conducted for [risk domain]? (yes/no, if yes link methodology)"

Question Categories (from v0.1 MUST)

Identity & Lineage

  • Is there a unique model identifier? (format: ...)
  • Is the base model specified? (if fine-tune/derivative)
  • Is training date/version documented?

Intended Use

  • Are ≥3 concrete out-of-scope uses specified with rationale?
  • Are deployment constraints documented?

Performance Claims

  • For each claimed benchmark: is dataset version specified?
  • For each claimed benchmark: is eval script commit linked?
  • For each claimed benchmark: is run hash/seed documented?

Limitations & Failure Modes

  • Is at least one "worse than baseline" context documented?
  • Are worst-case behaviors that were tested documented?

Data Provenance

  • Are data source classes documented?
  • Are filtering criteria documented?
  • Is a Data Card linked (if exists)?

Safety Testing

  • For each risk domain (jailbreaks, autonomy, persuasion, cyber, bio): was it evaluated? (yes/no)
  • If evaluated: is methodology linked?

Deliverables

  1. prompts/must-questions.md - finalized MUST-level binary questions
  2. prompts/should-questions.md - SHOULD-level questions
  3. schema/prompts.schema.json - machine-readable question definitions with validation rules

Workshopping Process

This needs iteration. Propose:

  1. Draft initial question set
  2. Test against existing model cards (what would pass/fail?)
  3. Gather feedback from potential adopters
  4. Refine based on edge cases

Related Issues

  • Evidence Linking Protocol (how answers link to artifacts)
  • Risk-Tiered Adversarial Framing (escalation for high-risk)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions