Skip to content

Data: Example Cards Collection & Analysis #8

@PipFoweraker

Description

@PipFoweraker

Goal

Collect and analyze existing model cards to:

  1. Understand current disclosure patterns
  2. Test interrogatory questions against real examples
  3. Create reference implementations

Data Collection

High-Profile Model Cards to Analyze

Frontier Labs

Open Models (HuggingFace)

  • Mistral series
  • Qwen series
  • DeepSeek series
  • Stability AI models

Specialized/Applied

  • Medical AI model cards (if public)
  • Autonomous vehicle perception models
  • Financial/risk models

Analysis Framework

For each card, document:

  1. Identity & Lineage

    • Is model uniquely identified? How?
    • Is base model / training lineage disclosed?
  2. Intended Use

    • Are out-of-scope uses specified?
    • How concrete are the examples?
  3. Performance Claims

    • Are benchmarks versioned?
    • Are eval scripts linked?
    • Are run artifacts available?
  4. Limitations

    • Are failure modes documented?
    • Is there a "worse than baseline" example?
  5. Data Provenance

    • What level of detail on training data?
    • Are filters/preprocessing documented?
  6. Safety Testing

    • Which risk domains covered?
    • Is methodology disclosed?
  7. Disclosure Completeness

    • What % of interrogatory questions could be answered from this card?
    • What's missing?

Deliverables

  1. data/card-analysis/ - structured analysis of each card
  2. data/card-corpus.json - machine-readable summary
  3. examples/ - reference interrogatory cards based on analysis
  4. Analysis report: "State of Model Card Disclosure (2026)"

Ethical Considerations

  • Only analyze publicly available cards
  • Don't scrape; use official sources
  • Credit original authors appropriately
  • Analysis is for research/improvement, not naming-and-shaming

Tooling Needs

  • Structured analysis template
  • Possibly: scraping tools for HF model cards (respectful rate limits)
  • Comparison visualization

Related Issues

  • Design: Sharp Yes/No Questions Specification (test questions against real cards)
  • Publication: LessWrong/AF Post Preparation (analysis feeds into publication)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions