Bestie

WDL based hts-analysis for slurm cluster with enviroment modules. More scalable then ever before setting discrete cpu, memory, runtimes and disk space based on input files for maximum* sheduling efficiency.

*: Always WIP due to lfs stability and edge cases.

Goals:

is to get generic alignment working.
basic variant calling (haplotypecallerGvcf).
somatic variant calling(MuTect2,...).
ichorCNA integration.
Variant annotation
Functional filtering of vcfs
Stability over > 10 samples
Rework dir structure to be more in line with warp/other public resources
End to end easybuild install

Relevant example resources for writing wdl files :

WARP BioWDL

How to use

Run the cromwell validation tool womtool to validate the input and generate a template input file. for sampleJson see ./tests/data/raw/fastq/samples.json. See below for the most simple run example

 java -Xmx8g -Dconfig.file=./path/to/cromwell.conf -jar ./path/to/cromwell.jar run Bestie.wdl -i inputs_integration.json

prepping a samplesheet/json for input

This first part makes a sample tsv with one readgroup per line that can be edited as needed

ls /path/to/raw/fastq/*_R1.fastq.gz | perl -wne 'BEGIN{print "fq1,fq2,sampleName\n"};chomp;print $_;s/_R1./_R2./g; print ",$_"; s/.*/([\w\d]*)_.*/$1/g; print ",$_" ;print "\n"'| perl SampleSheetTool.pl reformatmin /dev/stdin > samplesheet.csv

Converts the samplesheet to json file merging readgroups to samples as needed (some text alignment issues)

perl SamplesheetTool jsondump samplesheet.csv > samplesheet.json

The next command sets up the environment in the /path/to/workflow/output_folder/ copying all the needed files and linking all the needed folders raw and runs cromwell.jar to execute the workflow.

This needs '-d /path/to/folder/containing/data/' usually the 'apps/' folder for me.
This results in outputs below '/path/to/workflow/output_folder/' containing all the (fixupped) files needed for running the analysis.

fastq folders (example based on `tests/integration/run_local.sh`):
(
    set -ex
    bash tests/run_project.sh \
        -i $PWD/tests/integration/json/fastqToVariants/inputs_local.json \
        -s samplesheet.json \
        -w $PWD \
        -r /path/to/workflow/output_folder/ \
        -f /path/to/raw/fastq/ \
        -d /path/to/folder/containing/data/
)

How to install

easybuild the required modules or use future wrapper module

Used tools and databases

Name	project website	Article
GNU Parallel	gnu.org	doi
Fastqc	bioinformatics.babraham.ac.uk
BWA	github	preprint
Picard-tools	sourceforge	instructions below faq
GATK4 + MuTect2 toolkit	project home	instructions here
SAMtools	project home	pubmed
HTSeq	project home	pubmed
Cutadapt	github	doi
bcftools	github	pubmed
fgbio	github
freebayes	github	arxiv.org
IchorCNA	github	doi
LoFreq	github	pubmed
MarkTrimming	github
MultiQC	seqera	doi
VEP	github	doi
TrimGalore	github
GATK Bundle	human reference
Ensembl	reference/gtf dowload
UCSC Tools	format conversion/additional tools

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
tasks		tasks
tests		tests
workflows		workflows
.gitignore		.gitignore
Bestie.wdl		Bestie.wdl
LICENSE		LICENSE
README.md		README.md
SampleSheetTool.pl		SampleSheetTool.pl
inputs.json		inputs.json
inputs_habrok.json		inputs_habrok.json
options.json		options.json
structs.wdl		structs.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bestie

Goals:

How to use

prepping a samplesheet/json for input

How to install

Used tools and databases

About

Uh oh!

Releases

Packages

Languages

License

mmterpstra/Bestie

Folders and files

Latest commit

History

Repository files navigation

Bestie

Goals:

How to use

prepping a samplesheet/json for input

How to install

Used tools and databases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages