wexygen is a high-performance, GATK Best Practices-compliant WES/WGS pipeline toolkit built in C++ and Python. It provides fully automated preprocessing, configuration generation, annotation, workflow execution, and logging. The toolkit integrates industry-standard tools including GATK, ANNOVAR, and NIRVANA, and supports both Snakemake and Nextflow workflows with SLURM cluster execution.
This work is ongoing AND it is extensively based on and robustly insipred by the brilliant works of wes_gatk. They have an amazing project and much more completed than wexygen. So if you're looking for a much more robust toolkit, please check out their project WES_gatk
- Full GATK Best Practices-based workflow for WES and WGS data.
- Supports both Snakemake and Nextflow pipelines.
- SLURM schedular integration for cluster-based execution.
- Automated generation of workflow files during CMake build:
workflow_files/snakemake/workflow_files/nextflow/
- Automated environment setup:
- GATK installation
- ANNOVAR installation
- NIRVANA installation
- Reference genome downloads
- Known-sites VCF downloads
- Detailed logging and execution tracing.
- Preprocessing entrypoint (
preprocessor.py) for preparing raw FASTQ/inputs. - Unified runner binary (
./wexygen) that detects and executes the selected workflow engine. - Supports:
- Variant calling
- Annotation via ANNOVAR and NIRVANA
- QC metrics and contamination estimation (SVD)
- Joint use of multiple known-variant resources
- Reproducible output structure for downstream analysis.
- Built in C++17 and Python 3, portable across Linux systems.
wexygen is built using CMake. The build system automatically generates required support scripts and workflow directories.
mkdir build
cd build
cmake ..
make -jDuring the build, the following components are generated inside the build/ directory:
preprocessor.pywexygenbinaryworkflow_files/snakemake/workflow_files/nextflow/download_and_setup_data.shrun_wexygen.sh- Auto-installer scripts for:
- GATK
- ANNOVAR
- NIRVANA
- Reference data
- Known variants
After building, run the data setup script:
./download_and_setup_data.shThe preprocessor organizes raw FASTQ, verifies metadata, and prepares the WES/WGS run directory.
python preprocessor.py \
-i ./test/raw_data \
-o ./test/output \
--overwriteThe following example demonstrates a WES analysis using Snakemake, GATK Best Practices, and ANNOVAR/NIRVANA annotation:
./wexygen WES \
-i ./test/raw_data \
-o ./test/output \
--reference-fasta ./test/tools/broad/Homo_sapiens_assembly38.fasta \
--bed-file ./test/exome_bed/sureSelect_V6_60M.bed \
--gff-file ./test/gff/Homo_sapiens.GRCh38.109.gff3.gz \
--nirvana-path ./test/nirvana \
--annovar-path ./test/annovar \
--haplotype-db-file /home/propenster/src/compbio/gatk_res/Homo_sapiens_assembly38.haplotype_database.txt \
--known-variants-snps ./test/known_variants/Homo_sapiens_assembly38.dbsnp138.vcf \
--known-variants-indels ./test/known_variants/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
--known-variants-indels2 ./test/known_variants/Homo_sapiens_assembly38.known_indels.vcf.gz \
--reference-index ./test/tools/broad/Homo_sapiens_assembly38.fasta \
--svd-prefix ./test/tools/broad/Homo_sapiens_assembly38.contam \
--generate-confs-only \
--use-snakemake \
--threads 12(c) Faith (propenster) Olusegun 2025