A pipeline to run genome-wide burdent tests using MONSTER (v1.3 released on December 17, 2016). MONSTER ( (Minimum P-value Optimized Nuisance parameter Score Test Extended to Relatives) is a method for detecting association between a set of rare variants and a quantitative trait in samples that contain related individuals. MONSTER is based on a mixed-effects model that includes additive and environmental components of variance and adjustment for covariates. It can handle essentially arbitrary combinations of related and unrelated individuals, including small outbred pedigrees and unrelated individuals, as well as large, complex inbred pedigrees (Duo Jiang and Mary Sara McPeek, DOI: 10.1002/gepi.21775).
Mummy robust and highly customizable pipeline that allows users to define which variants should be included in the association test based on the overlapping genomic feature (eg. GENCODE annotation, if the annotation belongs to a canoncial transcript, overlap with associated regulatory feature etc.), variant feature (MAF threshold, missingness threshold) and adds custom weights (CADD, phred-scaled CADD, Eigen, phred-scaled Eigen or Linsight scores).
WARNING: This pipeline is designed for GRCh38!!
The following programs have to be in the path:
The following items should be available:
- bigWigTools
- Linsight genome-wide scores
- CADD genome-wide scores
- Eigen scores computed genome-wide (The downloaded Eigen scores have to be processed... see details below)
- linked features file (preparation of the file is detailed below.)
The filtering of variants partially based on the annotation found in the vcf files. The following INFO fields have to be present:
- consequence - the most severe consequence assigned to the variant based on Ensembl VEP
- lof - loftee loss-of-function annotation (HC and LC for the high confidence and low confidence variants respectively)
- AN, AC and AF for filtering for allele frequency and missingness.
This file contains information which genomic regions can be linked to a gene. Eg: a gene is linked to its exons, CDs, transcript and also the regulatory features that overlap with the gene plus other regulatory features that overlap with variants that are known eQTLs of the gene (based on GTEx data). The following script takes all these information and combines it together using various sources: GENCODE, APPRIS, Ensembl regulation, GTEx. Except GTEx, the data is accessed directly from the web, but the GTEx data has to be downloaded by the user and point to it when calling the script.
Usage: ./prepare_regions.sh -G <path to GTEx file> -o <Output folder>
For more information: ./prepare_regions.sh -h
... completing soon...
Update the location of the score files, linked feature file, path to bigWigTools etc. in the config.txt file. The script checks the existence of these files, and if a test fails, the script exits.