Skip to content

MS-Quality-Hub/biohackathon2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Project 29: Repository-Scale Quality Control in Proteomics

ELIXIR BioHackathon 2025

Important

Let's make public proteomics data FAIR and re-usable — at repository scale.

Mass spectrometry proteomics has generated an enormous global resource: >31,000 datasets in PRIDE alone, representing millions of biological samples. Yet, much of this treasure trove remains underused because quality information is inconsistent or missing.

This project tackles that head-on. Together, we'll build an automated, standardized quality control (QC) framework that can operate directly on public data repositories — producing machine-readable QC summaries (mzQC) linked with rich experimental metadata (SDRF-Proteomics).

By the end of the week, our prototype will:

  • Generate mzQC outputs directly from pMultiQC.
  • Define a core QC metric ontology adopted across tools.
  • Leverage metadata to inform and contextualize QC analyses.
  • Pave the way for FAIR, ML-ready proteomics data reuse.
  • Provide an ID-free QC module for raw-data assessment.

If you're excited about open science, reproducible bioinformatics, and hands-on development with real impact on the global proteomics community — join us!

BioHackathon 2025 Contributions Welcome License: MIT


Project Goals

Main objective: Build an end-to-end framework to enrich public proteomics datasets with standardized quality control (QC) information.

Key components:

  • mzQC: HUPO-PSI JSON format for standardized QC reporting.
  • pMultiQC: modular, multi-workflow QC tool for proteomics pipelines.
  • SDRF-Proteomics: standardized experimental metadata schema.

Expected outcomes by the end of the hackathon:

  • pMultiQC extended to export results in mzQC format.
  • A refined and tiered QC metric ontology.
  • Broader workflow coverage via new adapters.
  • Enhanced SDRF integration for metadata-driven QC.
  • Prototype ID-free QC modules for raw-data assessment.
  • Documentation and examples for repository integration.

Documentation & Resources


Schedule

We will follow the official BioHackathon Europe daily programme: https://biohackathon-europe.org/programme/

Stand-up: Every day at 09:00. We'll use Slack for quick updates and alignments; checkpoints and demos follow the event's program.

Collaboration & Contribution

  • BioHackEU Slack channel: #29-towards-repository-scale-quality-control.
  • Daily stand-up: 09:00.
  • Ideas or questions? Use Discussions.
  • Bugs / progress updates? Open Issues.
  • Code changes: via Pull Requests.

Contribution Workflow

We welcome all contributions during the hackathon! To keep collaboration efficient and transparent:

  1. Open an Issue first — describe your planned feature or task, tag everyone involved, and be detailed so others can follow or join. → This avoids duplicate efforts and keeps everyone aligned.
  2. Create a branch named feature/<short-description> for your work.
  3. Commit and reference your issue, e.g. Fixes #12.
  4. Open a Pull Request (PR) when ready — add a short summary and test results.
  5. Discuss & merge during the daily sync sessions.

See the CONTRIBUTING.md for coding style, testing, and detailed workflow instructions.

About

ELIXIR BioHackathon 2025 Project 29: Towards Repository-Scale Quality Control in Proteomics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors 3

  •  
  •  
  •