Include mapped variant data and README in public data dump

The public data dump script (src/mavedb/scripts/export_public_data.py) currently exports metadata (main.json), score/count CSVs, and a license file. It does not include mapped variant data (VRS alleles, mapped HGVS, etc.), even though this data is available via GET /api/v1/score-sets/{urn}/mapped-variants.

We should include mapped variant JSON in the data dump so that downstream consumers have access to post-mapped VRS representations without needing to call the live API.

Proposed Changes
1. Add mapped variant data to the dump
For each published score set that has completed mapping, export its mapped variant data (the same payload returned by GET /score-sets/{urn}/mapped-variants) as a JSON file in the archive, e.g.:

mapped/tmp:00000001-a-1.mapped-variants.json
Each file should contain the current mapped variants for that score set, including pre_mapped and post_mapped VRS allele JSON, HGVS columns, and VRS version metadata.

2. Add a README to the archive
Add a README.md (or README.txt) to the root of the dump archive that documents:

- What is included in the dump (metadata JSON, score CSVs, count CSVs, mapped variant JSON, license)
- The structure/layout of the archive directory
- A brief description of each file type and its format
- Any caveats (e.g. only CC0-licensed published data is included, only current mapped variants are exported)
- A link back to MaveDB and the API documentation for further reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include mapped variant data and README in public data dump #664

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Include mapped variant data and README in public data dump #664

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions