-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Purpose
Analyze 6 real model cards against interrogatory requirements to:
- Demonstrate the framework works across model types
- Show documentation gaps that the framework exposes
- Provide concrete examples for LW/AF publication
Selected Models
Frontier (2)
- Claude (3.5 Sonnet or 4) - Anthropic system card - expected to score well
- GPT-4 (or 4o) - OpenAI system card - benchmark comparison
Research (2)
- Mistral 7B or Mixtral - European lab, reportedly sparse documentation
- Qwen 2.5 - Alibaba, Chinese lab norms, interesting comparison
Unusual (2)
- Whisper - Audio transcription, non-generative
- Llama Guard 3 - Safety classifier, meta-level model
Analysis Template
For each model:
- Locate official card/documentation
- Score against 14 MUST questions
- Note SHOULD/CAN elements present
- Calculate disclosure completeness %
- Document specific gaps with quotes
- Write brief narrative assessment
Output
data/card-analysis/{model}-analysis.mdfor each model- Summary comparison table
- Narrative for LW/AF post
Timeline
Target: Complete by Feb 7 to allow time for writeup before mid-Feb publication
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels