Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions modules/nf-core/fastqc/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,28 @@ process FASTQC {
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0' :
'biocontainers/fastqc:0.12.1--hdfd78af_0' }"
container "${workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container
? 'https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0'
: 'biocontainers/fastqc:0.12.1--hdfd78af_0'}"

input:
tuple val(meta), path(reads)

output:
tuple val(meta) , path("*.html") , emit: html
tuple val(meta) , path("*.zip") , emit: zip
tuple val(meta), val("${task.process}"), val('fastqc'), path("*.html"), topic: multiqc_files, emit: html
tuple val(meta), val("${task.process}"), val('fastqc'), path("*.zip"), topic: multiqc_files, emit: zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), val("${task.process}"), val('fastqc'), path("*.zip"), topic: multiqc_files, emit: zip
tuple val(meta), path("*.zip"), topic: multiqc_files, emit: zip

Why can't we just do this? Why do we need the extra pieces like the versions topic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I need the val('fastqc') to know which tool this topic is coming from in order to publish it or not, and to choose where, and we agreed on slack when I was doing this for ensemblvep and snpeff that maybe having the task.process could be useful too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think maybe we need to separate:

  1. Sending things to multiqc
  2. Publishing

I think the basis for making publishing decisions from topics is wobbly right now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works well:

https://github.com/maxulysse/nf-core_rnavar/blob/topics_again_again/main.nf#L202-L216

for me, if I can put these reports into a topics for multiQC, then it's much more simpler if I use the same for publishing while I'm at it.
And we can filter out if we don't want to publish reports coming from tools/process...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'll be controlling publishing separately, I don't think we should be conflating these things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But could I as a lazy bioinformatician just conflate all these things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinin4fjords what is wobbly at the moment with topics and publishing?Looking at the rnavar PR it does look very clean, so I am inclined to use it, but not if my files don't end up where they need to be

Copy link
Member

@pinin4fjords pinin4fjords Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaking topic-based publishing in via the back door of the MultiQC topic is the wrong precedent to set, IMHO.

I actually think topics would be a great mechanism for publishing. The problem is that modules shouldn't know how their calling workflows will publish their outputs, so topic-based publishing really needs to be initiated from the workflow side. That's why I asked @bentsherman for a channel-to-topic operator (nextflow-io/nextflow#6756), though I'm not sure it's going to happen.

MultiQC is a bit of an exception here, because we do know how calling workflows will use those files - they'll go to MultiQC. But overloading that topic with publishing metadata unrelated to the MultiQC run itself isn't the right approach to my mind.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when you move to typed processes it could become cleaner:

    output:
    record(meta: meta, html: files("*.html"), zip: files('*.zip'))

    topic:
    tuple(meta, task.process, 'fastqc', files("*.html")) >> 'multiqc_files'
    tuple(meta, task.process, 'fastqc', files("*.zip")) >> 'multiqc_files'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bentsherman I now know what I need to do

tuple val("${task.process}"), val('fastqc'), eval('fastqc --version | sed "/FastQC v/!d; s/.*v//"'), emit: versions_fastqc, topic: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
// Make list of old name and new name pairs to use for renaming in the bash while loop
def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[ reads, "${prefix}.${reads.extension}" ]] : reads.withIndex().collect { entry, index -> [ entry, "${prefix}_${index + 1}.${entry.extension}" ] }
def rename_to = old_new_pairs*.join(' ').join(' ')
def renamed_files = old_new_pairs.collect{ _old_name, new_name -> new_name }.join(' ')
def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[reads, "${prefix}.${reads.extension}"]] : reads.withIndex().collect { entry, index -> [entry, "${prefix}_${index + 1}.${entry.extension}"] }
def rename_to = old_new_pairs*.join(' ').join(' ')
def renamed_files = old_new_pairs.collect { _old_name, new_name -> new_name }.join(' ')

// The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory)
// https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222
Expand Down
45 changes: 43 additions & 2 deletions modules/nf-core/fastqc/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ tools:
overrepresented sequences.
homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
licence: ["GPL-2.0-only"]
licence:
- "GPL-2.0-only"
identifier: biotools:fastqc
input:
- - meta:
Expand All @@ -37,6 +38,12 @@ output:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- ${task.process}:
type: string
description: The process the versions were collected from
- fastqc:
type: string
description: The tool name
- "*.html":
type: file
description: FastQC report
Expand All @@ -48,6 +55,12 @@ output:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- ${task.process}:
type: string
description: The process the versions were collected from
- fastqc:
type: string
description: The tool name
- "*.zip":
type: file
description: FastQC report archive
Expand All @@ -63,7 +76,6 @@ output:
- fastqc --version | sed "/FastQC v/!d; s/.*v//":
type: eval
description: The expression to obtain the version of the tool

topics:
versions:
- - ${task.process}:
Expand All @@ -75,6 +87,35 @@ topics:
- fastqc --version | sed "/FastQC v/!d; s/.*v//":
type: eval
description: The expression to obtain the version of the tool
multiqc_files:
- - meta:
type: string
description: The name of the process
- ${task.process}:
type: string
description: The process the versions were collected from
- fastqc:
type: string
description: The tool name
- "*.html":
type: file
description: FastQC report
pattern: "*_{fastqc.html}"
ontologies: []
- - meta:
type: string
description: The name of the process
- ${task.process}:
type: string
description: The process the versions were collected from
- fastqc:
type: string
description: The tool name
- "*.zip":
type: file
description: FastQC report archive
pattern: "*_{fastqc.zip}"
ontologies: []
authors:
- "@drpatelh"
- "@grst"
Expand Down
112 changes: 58 additions & 54 deletions modules/nf-core/fastqc/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,17 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
// NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it.
// looks like this: <div id="header_filename">Mon 2 Oct 2023<br/>test.gz</div>
// https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039
{ assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
{ assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
{ assert path(process.out.html[0][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(sanitizeOutput(process.out).findAll { key, val -> key != 'html' && key != 'zip' }).match() }
{ assert path(process.out.html[0][3]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer the shorter version tbh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, but the snapshot is much more concise with this.
I am hoping to put this in nft-utils to make clearer assertion such as:

process.out.html.getAllFilenames()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see much difference in the snapshot. what do you mean exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now only have the file, not the meta map, the process and the tool name (which is the structure we choose for the multiqc_files topic).
and much more simpler to use since I don't have to figure which index it's located at

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.out.zip.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
)
}
}
Expand All @@ -50,15 +52,15 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" },
{ assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" },
{ assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" },
{ assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" },
{ assert path(process.out.html[0][1][0]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][1][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(sanitizeOutput(process.out).findAll { key, val -> key != 'html' && key != 'zip' }).match() }
{ assert path(process.out.html[0][3][0]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][3][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.zip.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
)
}
}
Expand All @@ -77,12 +79,14 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
{ assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
{ assert path(process.out.html[0][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(sanitizeOutput(process.out).findAll { key, val -> key != 'html' && key != 'zip' }).match() }
{ assert path(process.out.html[0][3]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.zip.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
)
}
}
Expand All @@ -101,12 +105,14 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert process.out.html[0][1] ==~ ".*/test_fastqc.html" },
{ assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" },
{ assert path(process.out.html[0][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(sanitizeOutput(process.out).findAll { key, val -> key != 'html' && key != 'zip' }).match() }
{ assert path(process.out.html[0][3]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.zip.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
)
}
}
Expand All @@ -128,21 +134,17 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" },
{ assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" },
{ assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" },
{ assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" },
{ assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" },
{ assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" },
{ assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" },
{ assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" },
{ assert path(process.out.html[0][1][0]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][1][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][1][2]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][1][3]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(sanitizeOutput(process.out).findAll { key, val -> key != 'html' && key != 'zip' }).match() }
{ assert path(process.out.html[0][3][0]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][3][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][3][2]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert path(process.out.html[0][3][3]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.zip.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
)
}
}
Expand All @@ -161,12 +163,14 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" },
{ assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" },
{ assert path(process.out.html[0][1]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(sanitizeOutput(process.out).findAll { key, val -> key != 'html' && key != 'zip' }).match() }
{ assert path(process.out.html[0][3]).text.contains("<tr><td>File type</td><td>Conventional base calls</td></tr>") },
{ assert snapshot(
process.out.html.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.zip.collect().flatten().findAll { !(it instanceof Map) && it.startsWith("/") }.collect { file(it).name },
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
)
}
}
Expand All @@ -186,9 +190,9 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert snapshot(sanitizeOutput(process.out)).match() }
)
}
}
Expand All @@ -209,9 +213,9 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert snapshot(sanitizeOutput(process.out)).match() }
)
}
}
Expand All @@ -231,9 +235,9 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert snapshot(sanitizeOutput(process.out)).match() }
)
}
}
Expand All @@ -253,9 +257,9 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert snapshot(sanitizeOutput(process.out)).match() }
)
}
}
Expand All @@ -278,9 +282,9 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert snapshot(sanitizeOutput(process.out)).match() }
)
}
}
Expand All @@ -300,9 +304,9 @@ nextflow_process {
}

then {
assert process.success
assertAll (
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert snapshot(sanitizeOutput(process.out)).match() }
)
}
}
Expand Down
Loading