Skip to content

FASTQC: multiqc_files topic#9964

Open
maxulysse wants to merge 18 commits intonf-core:masterfrom
maxulysse:fastqc
Open

FASTQC: multiqc_files topic#9964
maxulysse wants to merge 18 commits intonf-core:masterfrom
maxulysse:fastqc

Conversation

@maxulysse
Copy link
Member

@maxulysse maxulysse commented Feb 10, 2026

cf:

current implementation in nf-core/rnavar#285

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Copy link
Contributor

@LouisLeNezet LouisLeNezet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment 😉

{ assert snapshot(
file(process.out.html[0][3]).name,
file(process.out.zip[0][3]).name,
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer the shorter version tbh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, but the snapshot is much more concise with this.
I am hoping to put this in nft-utils to make clearer assertion such as:

process.out.html.getAllFilenames()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see much difference in the snapshot. what do you mean exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now only have the file, not the meta map, the process and the tool name (which is the structure we choose for the multiqc_files topic).
and much more simpler to use since I don't have to figure which index it's located at

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuple val(meta) , path("*.html") , emit: html
tuple val(meta) , path("*.zip") , emit: zip
tuple val(meta), val("${task.process}"), val('fastqc'), path("*.html"), topic: multiqc_files, emit: html
tuple val(meta), val("${task.process}"), val('fastqc'), path("*.zip"), topic: multiqc_files, emit: zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), val("${task.process}"), val('fastqc'), path("*.zip"), topic: multiqc_files, emit: zip
tuple val(meta), path("*.zip"), topic: multiqc_files, emit: zip

Why can't we just do this? Why do we need the extra pieces like the versions topic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I need the val('fastqc') to know which tool this topic is coming from in order to publish it or not, and to choose where, and we agreed on slack when I was doing this for ensemblvep and snpeff that maybe having the task.process could be useful too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think maybe we need to separate:

  1. Sending things to multiqc
  2. Publishing

I think the basis for making publishing decisions from topics is wobbly right now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works well:

https://github.com/maxulysse/nf-core_rnavar/blob/topics_again_again/main.nf#L202-L216

for me, if I can put these reports into a topics for multiQC, then it's much more simpler if I use the same for publishing while I'm at it.
And we can filter out if we don't want to publish reports coming from tools/process...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'll be controlling publishing separately, I don't think we should be conflating these things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But could I as a lazy bioinformatician just conflate all these things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinin4fjords what is wobbly at the moment with topics and publishing?Looking at the rnavar PR it does look very clean, so I am inclined to use it, but not if my files don't end up where they need to be

Copy link
Member

@pinin4fjords pinin4fjords Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaking topic-based publishing in via the back door of the MultiQC topic is the wrong precedent to set, IMHO.

I actually think topics would be a great mechanism for publishing. The problem is that modules shouldn't know how their calling workflows will publish their outputs, so topic-based publishing really needs to be initiated from the workflow side. That's why I asked @bentsherman for a channel-to-topic operator (nextflow-io/nextflow#6756), though I'm not sure it's going to happen.

MultiQC is a bit of an exception here, because we do know how calling workflows will use those files - they'll go to MultiQC. But overloading that topic with publishing metadata unrelated to the MultiQC run itself isn't the right approach to my mind.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when you move to typed processes it could become cleaner:

    output:
    record(meta: meta, html: files("*.html"), zip: files('*.zip'))

    topic:
    tuple(meta, task.process, 'fastqc', files("*.html")) >> 'multiqc_files'
    tuple(meta, task.process, 'fastqc', files("*.zip")) >> 'multiqc_files'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bentsherman I now know what I need to do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants