Add additional inputs to the samplesheet#647
Conversation
|
atrigila
left a comment
There was a problem hiding this comment.
Great work! I wasn't able to run a test on these changes yet but here are a few questions.
docs/usage.md
Outdated
| | `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | :white_check_mark: | | ||
| | `strandedness` | Strandedness: forward or reverse. | :white_check_mark: | | ||
| | `fastq_1` | Full path to FastQ file for Illumina short reads 1. File must exist, has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". It's recommended to always provide the FASTQ file(s) because the pipeline will be able to create any missing files from these. The FASTQ files are required to run `salmon`, `fusioninspector` and `fusioncatcher`. | :grey_question: | | ||
| | `fastq_2` | Full path to FastQ file for Illumina short reads 2. File must exist, has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | :x: | |
There was a problem hiding this comment.
Aren't fastq_2 always required, or at least suggested? I am asking because of this comment: #613 (comment)
There was a problem hiding this comment.
Yes, you are right! I'll change it
There was a problem hiding this comment.
https://nfcore.slack.com/archives/CGB1CL0SC/p1744208002004059?thread_ts=1714404757.300169&cid=CGB1CL0SC
maybe we are closer to single reads than we think?
There was a problem hiding this comment.
In this section:
// Check if the file is not a directory or is a URL and return whether it's empty or not
def is_url = ["https://", "ftp://", "http://"].findAll { it -> path.startsWith(it) }.size() > 0
if(is_url || !path_to_check.toFile().isDirectory()) {
return !path_to_check.isEmpty()
}
```
I think this fails when you use an s3 directory as input file
There was a problem hiding this comment.
I've tested this and it works with s3 directories :). The reason this doesn't work for http(s) and ftp is because of the way these protocols work
There was a problem hiding this comment.
I also tested it but it failed for me? Perhaps I used an incorrect command? Would you mind sharing me how you did?
There was a problem hiding this comment.
I tried supplying a fasta from iGenomes, but it failed in this step.
There was a problem hiding this comment.
Strange... did you get an error?
There was a problem hiding this comment.
Downloading plugin nf-amazon@2.9.2
ERROR ~ Unexpected error [UnsupportedOperationException]
-- Check script 'nf-core-rnafusion/./workflows/../subworkflows/local/build_references.nf' at line: 240 or see '.nextflow.log' file for more details
There was a problem hiding this comment.
Yeah I get that error too now 😓 Thanks for pointing this out! I'm currently adding some tests for this
There was a problem hiding this comment.
I fixed it and added tests, it should be okay now
|
I tested your code with: It failed due to the following error: If you could open an issue about that error it would be great. Running it again with This worked well, with only a final error related to #649. The core functionality introduced in this PR (especially the BAM handling steps) worked as expected, and it's great to see the STAR alignment can now be skipped, this should help speed up CI tests considerably. It might be a good idea to eventually add a BAM/CRAM to FASTQ conversion step to the pipeline to remove the dependency on FASTQs. Would love to see CI tests covering these new changes soon. I would consider this PR approved so it doesn’t block further development, but it’s of course open for comments and feedback from others. |
docs/usage.md
Outdated
| | `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | :white_check_mark: | | ||
| | `strandedness` | Strandedness: forward or reverse. | :white_check_mark: | | ||
| | `fastq_1` | Full path to FastQ file for Illumina short reads 1. File must exist, has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". It's recommended to always provide the FASTQ file(s) because the pipeline will be able to create any missing files from these. The FASTQ files are required to run `salmon`, `fusioninspector` and `fusioncatcher`. | :grey_question: | | ||
| | `fastq_2` | Full path to FastQ file for Illumina short reads 2. File must exist, has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | :x: | |
There was a problem hiding this comment.
https://nfcore.slack.com/archives/CGB1CL0SC/p1744208002004059?thread_ts=1714404757.300169&cid=CGB1CL0SC
maybe we are closer to single reads than we think?
| //TODO: unify as if(tools.contains("fusioninspector")) once nextflow bug fixed | ||
| def run_fusioninspector = tools.contains("fusioninspector") | ||
| if(run_fusioninspector) { | ||
| if(run_fusioninspector && !params.skip_vcf) { |
There was a problem hiding this comment.
Would it make sense to have a dependency matrix in one place instead of spreading out the logic in multiple if statements that are harder to maintain? If you agree, we can open an issue (not for this PR)
There was a problem hiding this comment.
Yes I would love that! But that seems really complicated with this pipeline, so might be better for a next release?
Fixes #644
This PR adds the possibility to supply BAM, CRAM, junctions and splice_junctions files to the samplesheet. Adding these files will make sure that the STAR alignment step will be skipped, thus increasing the speed and efficiency of the pipeline.
It's however still advised to also pass the FASTQ files since some tools depend on these files. Would it make sense to add a
BAM/CRAM -> FASTQconversion to the pipeline to eliminate this requirement?I also fixed the flow a bit more here since
VCF_COLLECTdidn't actually run before