Skip to content

Card Analyses: 6 Models for Publication #13

@PipFoweraker

Description

@PipFoweraker

Purpose

Analyze 6 real model cards against interrogatory requirements to:

  1. Demonstrate the framework works across model types
  2. Show documentation gaps that the framework exposes
  3. Provide concrete examples for LW/AF publication

Selected Models

Frontier (2)

  • Claude (3.5 Sonnet or 4) - Anthropic system card - expected to score well
  • GPT-4 (or 4o) - OpenAI system card - benchmark comparison

Research (2)

  • Mistral 7B or Mixtral - European lab, reportedly sparse documentation
  • Qwen 2.5 - Alibaba, Chinese lab norms, interesting comparison

Unusual (2)

  • Whisper - Audio transcription, non-generative
  • Llama Guard 3 - Safety classifier, meta-level model

Analysis Template

For each model:

  1. Locate official card/documentation
  2. Score against 14 MUST questions
  3. Note SHOULD/CAN elements present
  4. Calculate disclosure completeness %
  5. Document specific gaps with quotes
  6. Write brief narrative assessment

Output

  • data/card-analysis/{model}-analysis.md for each model
  • Summary comparison table
  • Narrative for LW/AF post

Timeline

Target: Complete by Feb 7 to allow time for writeup before mid-Feb publication

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions