Publication: LessWrong/Alignment Forum Post Preparation

## Goal

Prepare a compelling post for LessWrong / Alignment Forum that introduces interrogatory model cards to the AI safety and alignment community.

## Why LW/AF First?

1. **Audience alignment** - community cares about transparency, evals, AI governance
2. **Quality feedback** - rigorous commenters will stress-test the proposal
3. **Network effects** - shares into adjacent communities (EA, AI policy, researchers)
4. **Credibility building** - establishes intellectual foundation before broader push

## Post Structure (Draft Outline)

### 1. The Problem (Hook)
- Model cards are often PR documents, not technical documentation
- Selective disclosure, vague provenance, flattering metrics
- Regulation is coming (EU AI Act) but most cards won't meet the bar
- Link to "Evals Gap" and "Science of Evals" discourse

### 2. Existing Approaches (What's Been Tried)
- Mitchell et al. 2019 Model Cards
- HuggingFace adoption (widespread but uneven quality)
- System Cards (Anthropic, OpenAI) - better but still self-reported
- AI Cards (APF 2024) - machine-readable but not interrogatory
- Why these aren't enough

### 3. Interrogatory Model Cards (The Proposal)
- Design goal: hard to obfuscate, easy to audit, low friction to author
- CAN/SHOULD/MUST framework
- Sharp yes/no questions (examples)
- Evidence linking (demo)
- Risk-tiered adversarial framing

### 4. Schema & Tooling (Concrete Artifacts)
- JSON-LD schema extending Croissant
- Permissive design: non-disclosure is visible, not blocking
- Validation tools, card builder
- Live demo link

### 5. What This Enables
- Comparable disclosure across models
- Automated compliance checking
- Third-party verification hooks
- Regulatory alignment (EU AI Act Annex IV mapping)

### 6. Limitations & Open Questions
- Doesn't solve: adversarial non-compliance, capability evaluations themselves
- Open: tier assignment, link rot, proprietary evidence
- Request for feedback on specific design choices

### 7. Call to Action
- Try the schema on your model
- Contribute to the spec
- Discuss: what questions should be MUST?

## Writing Considerations

### LW/AF Norms
- Show your work (reasoning, not just conclusions)
- Acknowledge uncertainty and limitations
- Engage with likely objections preemptively
- Concrete examples > abstract claims

### Avoid
- Marketing speak
- Overclaiming ("this solves AI transparency")
- Dismissing existing work unfairly

### Include
- Links to all artifacts (schema, tools, examples)
- Interactive demo if available
- Explicit request for specific feedback

## Pre-Publication Checklist

- [ ] Schema finalized (at least v0.2)
- [ ] At least 2-3 example cards created
- [ ] Validation tooling working
- [ ] Demo website live
- [ ] Post draft reviewed by 2+ people
- [ ] Cross-post strategy (LW vs AF vs both?)

## Deliverables

1. `docs/publication/lw-af-post.md` - draft post
2. Supplementary materials (diagrams, examples)
3. Response plan for comments/feedback

## Timeline Considerations

- Don't rush - quality matters more than speed for this audience
- Consider posting when you can actively engage with comments (not before travel, etc.)
- Monday-Wednesday posts tend to get more engagement

## Related Issues

- All other issues feed into this
- Data: Example Cards (needed for concrete demos)
- Frontend: Interactive Demo (linked from post)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publication: LessWrong/Alignment Forum Post Preparation #11

Goal

Why LW/AF First?

Post Structure (Draft Outline)

1. The Problem (Hook)

2. Existing Approaches (What's Been Tried)

3. Interrogatory Model Cards (The Proposal)

4. Schema & Tooling (Concrete Artifacts)

5. What This Enables

6. Limitations & Open Questions

7. Call to Action

Writing Considerations

LW/AF Norms

Avoid

Include

Pre-Publication Checklist

Deliverables

Timeline Considerations

Related Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Publication: LessWrong/Alignment Forum Post Preparation #11

Description

Goal

Why LW/AF First?

Post Structure (Draft Outline)

1. The Problem (Hook)

2. Existing Approaches (What's Been Tried)

3. Interrogatory Model Cards (The Proposal)

4. Schema & Tooling (Concrete Artifacts)

5. What This Enables

6. Limitations & Open Questions

7. Call to Action

Writing Considerations

LW/AF Norms

Avoid

Include

Pre-Publication Checklist

Deliverables

Timeline Considerations

Related Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions