Skip to content

Publication: LessWrong/Alignment Forum Post Preparation #11

@PipFoweraker

Description

@PipFoweraker

Goal

Prepare a compelling post for LessWrong / Alignment Forum that introduces interrogatory model cards to the AI safety and alignment community.

Why LW/AF First?

  1. Audience alignment - community cares about transparency, evals, AI governance
  2. Quality feedback - rigorous commenters will stress-test the proposal
  3. Network effects - shares into adjacent communities (EA, AI policy, researchers)
  4. Credibility building - establishes intellectual foundation before broader push

Post Structure (Draft Outline)

1. The Problem (Hook)

  • Model cards are often PR documents, not technical documentation
  • Selective disclosure, vague provenance, flattering metrics
  • Regulation is coming (EU AI Act) but most cards won't meet the bar
  • Link to "Evals Gap" and "Science of Evals" discourse

2. Existing Approaches (What's Been Tried)

  • Mitchell et al. 2019 Model Cards
  • HuggingFace adoption (widespread but uneven quality)
  • System Cards (Anthropic, OpenAI) - better but still self-reported
  • AI Cards (APF 2024) - machine-readable but not interrogatory
  • Why these aren't enough

3. Interrogatory Model Cards (The Proposal)

  • Design goal: hard to obfuscate, easy to audit, low friction to author
  • CAN/SHOULD/MUST framework
  • Sharp yes/no questions (examples)
  • Evidence linking (demo)
  • Risk-tiered adversarial framing

4. Schema & Tooling (Concrete Artifacts)

  • JSON-LD schema extending Croissant
  • Permissive design: non-disclosure is visible, not blocking
  • Validation tools, card builder
  • Live demo link

5. What This Enables

  • Comparable disclosure across models
  • Automated compliance checking
  • Third-party verification hooks
  • Regulatory alignment (EU AI Act Annex IV mapping)

6. Limitations & Open Questions

  • Doesn't solve: adversarial non-compliance, capability evaluations themselves
  • Open: tier assignment, link rot, proprietary evidence
  • Request for feedback on specific design choices

7. Call to Action

  • Try the schema on your model
  • Contribute to the spec
  • Discuss: what questions should be MUST?

Writing Considerations

LW/AF Norms

  • Show your work (reasoning, not just conclusions)
  • Acknowledge uncertainty and limitations
  • Engage with likely objections preemptively
  • Concrete examples > abstract claims

Avoid

  • Marketing speak
  • Overclaiming ("this solves AI transparency")
  • Dismissing existing work unfairly

Include

  • Links to all artifacts (schema, tools, examples)
  • Interactive demo if available
  • Explicit request for specific feedback

Pre-Publication Checklist

  • Schema finalized (at least v0.2)
  • At least 2-3 example cards created
  • Validation tooling working
  • Demo website live
  • Post draft reviewed by 2+ people
  • Cross-post strategy (LW vs AF vs both?)

Deliverables

  1. docs/publication/lw-af-post.md - draft post
  2. Supplementary materials (diagrams, examples)
  3. Response plan for comments/feedback

Timeline Considerations

  • Don't rush - quality matters more than speed for this audience
  • Consider posting when you can actively engage with comments (not before travel, etc.)
  • Monday-Wednesday posts tend to get more engagement

Related Issues

  • All other issues feed into this
  • Data: Example Cards (needed for concrete demos)
  • Frontend: Interactive Demo (linked from post)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions