ELIXIR BioHackathon 2025
Important
Let's make public proteomics data FAIR and re-usable — at repository scale.
Mass spectrometry proteomics has generated an enormous global resource: >31,000 datasets in PRIDE alone, representing millions of biological samples. Yet, much of this treasure trove remains underused because quality information is inconsistent or missing.
This project tackles that head-on. Together, we'll build an automated, standardized quality control (QC) framework that can operate directly on public data repositories — producing machine-readable QC summaries (mzQC) linked with rich experimental metadata (SDRF-Proteomics).
By the end of the week, our prototype will:
- Generate mzQC outputs directly from pMultiQC.
- Define a core QC metric ontology adopted across tools.
- Leverage metadata to inform and contextualize QC analyses.
- Pave the way for FAIR, ML-ready proteomics data reuse.
- Provide an ID-free QC module for raw-data assessment.
If you're excited about open science, reproducible bioinformatics, and hands-on development with real impact on the global proteomics community — join us!
Main objective: Build an end-to-end framework to enrich public proteomics datasets with standardized quality control (QC) information.
Key components:
- mzQC: HUPO-PSI JSON format for standardized QC reporting.
- pMultiQC: modular, multi-workflow QC tool for proteomics pipelines.
- SDRF-Proteomics: standardized experimental metadata schema.
Expected outcomes by the end of the hackathon:
- pMultiQC extended to export results in mzQC format.
- A refined and tiered QC metric ontology.
- Broader workflow coverage via new adapters.
- Enhanced SDRF integration for metadata-driven QC.
- Prototype ID-free QC modules for raw-data assessment.
- Documentation and examples for repository integration.
- Tasks Overview: Summary of all hackathon tasks and roles
- Task 1 — mzQC Export in pMultiQC: Implement mzQC output generation
- Task 2 — Tiered QC Metrics: Curate and define core/extended metrics
- Task 3 — Workflow Adapters: Add support for new tools
- Task 4 — SDRF Integration: Link sample metadata to QC analyses
- Task 5 — ID-Free QC: Develop raw-level QC modules
- Optional Extensions: Dashboards, benchmark datasets, ML exploration
- Reference Material: Links to mzQC, pMultiQC, and SDRF docs
- Example Outputs: Example
.mzQCfiles + validation tips
We will follow the official BioHackathon Europe daily programme: https://biohackathon-europe.org/programme/
Stand-up: Every day at 09:00. We'll use Slack for quick updates and alignments; checkpoints and demos follow the event's program.
- BioHackEU Slack channel:
#29-towards-repository-scale-quality-control. - Daily stand-up: 09:00.
- Ideas or questions? Use Discussions.
- Bugs / progress updates? Open Issues.
- Code changes: via Pull Requests.
We welcome all contributions during the hackathon! To keep collaboration efficient and transparent:
- Open an Issue first — describe your planned feature or task, tag everyone involved, and be detailed so others can follow or join. → This avoids duplicate efforts and keeps everyone aligned.
- Create a branch named
feature/<short-description>for your work. - Commit and reference your issue, e.g.
Fixes #12. - Open a Pull Request (PR) when ready — add a short summary and test results.
- Discuss & merge during the daily sync sessions.
See the CONTRIBUTING.md for coding style, testing, and detailed workflow instructions.