Skip to content

mmterpstra/Bestie

Repository files navigation

Bestie

WDL based hts-analysis for slurm cluster with enviroment modules. More scalable then ever before setting discrete cpu, memory, runtimes and disk space based on input files for maximum* sheduling efficiency.

*: Always WIP due to lfs stability and edge cases.

Goals:

  • is to get generic alignment working.
  • basic variant calling (haplotypecallerGvcf).
  • somatic variant calling(MuTect2,...).
  • ichorCNA integration.
  • Variant annotation
  • Functional filtering of vcfs
  • Stability over > 10 samples
  • Rework dir structure to be more in line with warp/other public resources
  • End to end easybuild install

Relevant example resources for writing wdl files :

WARP BioWDL

How to use

Run the cromwell validation tool womtool to validate the input and generate a template input file. for sampleJson see ./tests/data/raw/fastq/samples.json. See below for the most simple run example

 java -Xmx8g -Dconfig.file=./path/to/cromwell.conf -jar ./path/to/cromwell.jar run Bestie.wdl -i inputs_integration.json

prepping a samplesheet/json for input

This first part makes a sample tsv with one readgroup per line that can be edited as needed

ls /path/to/raw/fastq/*_R1.fastq.gz | perl -wne 'BEGIN{print "fq1,fq2,sampleName\n"};chomp;print $_;s/_R1./_R2./g; print ",$_"; s/.*/([\w\d]*)_.*/$1/g; print ",$_" ;print "\n"'| perl SampleSheetTool.pl reformatmin /dev/stdin > samplesheet.csv

Converts the samplesheet to json file merging readgroups to samples as needed (some text alignment issues)

perl SamplesheetTool jsondump samplesheet.csv > samplesheet.json

The next command sets up the environment in the /path/to/workflow/output_folder/ copying all the needed files and linking all the needed folders raw and runs cromwell.jar to execute the workflow.

  • This needs '-d /path/to/folder/containing/data/' usually the 'apps/' folder for me.
  • This results in outputs below '/path/to/workflow/output_folder/' containing all the (fixupped) files needed for running the analysis.
fastq folders (example based on `tests/integration/run_local.sh`):
(
    set -ex
    bash tests/run_project.sh \
        -i $PWD/tests/integration/json/fastqToVariants/inputs_local.json \
        -s samplesheet.json \
        -w $PWD \
        -r /path/to/workflow/output_folder/ \
        -f /path/to/raw/fastq/ \
        -d /path/to/folder/containing/data/
)

How to install

easybuild the required modules or use future wrapper module

Used tools and databases

Name project website Article
GNU Parallel gnu.org doi
Fastqc bioinformatics.babraham.ac.uk
BWA github preprint
Picard-tools sourceforge instructions below faq
GATK4 + MuTect2 toolkit project home instructions here
SAMtools project home pubmed
HTSeq project home pubmed
Cutadapt github doi
bcftools github pubmed
fgbio github
freebayes github arxiv.org
IchorCNA github doi
LoFreq github pubmed
MarkTrimming github
MultiQC seqera doi
VEP github doi
TrimGalore github
GATK Bundle human reference
Ensembl reference/gtf dowload
UCSC Tools format conversion/additional tools

About

wdl hts-analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published