-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Goal
Prepare a compelling post for LessWrong / Alignment Forum that introduces interrogatory model cards to the AI safety and alignment community.
Why LW/AF First?
- Audience alignment - community cares about transparency, evals, AI governance
- Quality feedback - rigorous commenters will stress-test the proposal
- Network effects - shares into adjacent communities (EA, AI policy, researchers)
- Credibility building - establishes intellectual foundation before broader push
Post Structure (Draft Outline)
1. The Problem (Hook)
- Model cards are often PR documents, not technical documentation
- Selective disclosure, vague provenance, flattering metrics
- Regulation is coming (EU AI Act) but most cards won't meet the bar
- Link to "Evals Gap" and "Science of Evals" discourse
2. Existing Approaches (What's Been Tried)
- Mitchell et al. 2019 Model Cards
- HuggingFace adoption (widespread but uneven quality)
- System Cards (Anthropic, OpenAI) - better but still self-reported
- AI Cards (APF 2024) - machine-readable but not interrogatory
- Why these aren't enough
3. Interrogatory Model Cards (The Proposal)
- Design goal: hard to obfuscate, easy to audit, low friction to author
- CAN/SHOULD/MUST framework
- Sharp yes/no questions (examples)
- Evidence linking (demo)
- Risk-tiered adversarial framing
4. Schema & Tooling (Concrete Artifacts)
- JSON-LD schema extending Croissant
- Permissive design: non-disclosure is visible, not blocking
- Validation tools, card builder
- Live demo link
5. What This Enables
- Comparable disclosure across models
- Automated compliance checking
- Third-party verification hooks
- Regulatory alignment (EU AI Act Annex IV mapping)
6. Limitations & Open Questions
- Doesn't solve: adversarial non-compliance, capability evaluations themselves
- Open: tier assignment, link rot, proprietary evidence
- Request for feedback on specific design choices
7. Call to Action
- Try the schema on your model
- Contribute to the spec
- Discuss: what questions should be MUST?
Writing Considerations
LW/AF Norms
- Show your work (reasoning, not just conclusions)
- Acknowledge uncertainty and limitations
- Engage with likely objections preemptively
- Concrete examples > abstract claims
Avoid
- Marketing speak
- Overclaiming ("this solves AI transparency")
- Dismissing existing work unfairly
Include
- Links to all artifacts (schema, tools, examples)
- Interactive demo if available
- Explicit request for specific feedback
Pre-Publication Checklist
- Schema finalized (at least v0.2)
- At least 2-3 example cards created
- Validation tooling working
- Demo website live
- Post draft reviewed by 2+ people
- Cross-post strategy (LW vs AF vs both?)
Deliverables
docs/publication/lw-af-post.md- draft post- Supplementary materials (diagrams, examples)
- Response plan for comments/feedback
Timeline Considerations
- Don't rush - quality matters more than speed for this audience
- Consider posting when you can actively engage with comments (not before travel, etc.)
- Monday-Wednesday posts tend to get more engagement
Related Issues
- All other issues feed into this
- Data: Example Cards (needed for concrete demos)
- Frontend: Interactive Demo (linked from post)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels