Skip to content

A collection of scripts to run genome-wide burden testing using MONSTER.

Notifications You must be signed in to change notification settings

e-jorsboe/burden_testing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mummy (the wrapped MONSTER)

A pipeline to run genome-wide burdent tests using MONSTER (v1.3 released on December 17, 2016). MONSTER ( (Minimum P-value Optimized Nuisance parameter Score Test Extended to Relatives) is a method for detecting association between a set of rare variants and a quantitative trait in samples that contain related individuals. MONSTER is based on a mixed-effects model that includes additive and environmental components of variance and adjustment for covariates. It can handle essentially arbitrary combinations of related and unrelated individuals, including small outbred pedigrees and unrelated individuals, as well as large, complex inbred pedigrees (Duo Jiang and Mary Sara McPeek, DOI: 10.1002/gepi.21775).

Mummy robust and highly customizable pipeline that allows users to define which variants should be included in the association test based on the overlapping genomic feature (eg. GENCODE annotation, if the annotation belongs to a canoncial transcript, overlap with associated regulatory feature etc.), variant feature (MAF threshold, missingness threshold) and adds custom weights (CADD, phred-scaled CADD, Eigen, phred-scaled Eigen or Linsight scores).

WARNING: This pipeline is designed for GRCh38!!

Requirements

The following programs have to be in the path:

The following items should be available:

The filtering of variants partially based on the annotation found in the vcf files. The following INFO fields have to be present:

  • consequence - the most severe consequence assigned to the variant based on Ensembl VEP
  • lof - loftee loss-of-function annotation (HC and LC for the high confidence and low confidence variants respectively)
  • AN, AC and AF for filtering for allele frequency and missingness.

Setting up the pipeline:

Generating linked features file:

This file contains information which genomic regions can be linked to a gene. Eg: a gene is linked to its exons, CDs, transcript and also the regulatory features that overlap with the gene plus other regulatory features that overlap with variants that are known eQTLs of the gene (based on GTEx data). The following script takes all these information and combines it together using various sources: GENCODE, APPRIS, Ensembl regulation, GTEx. Except GTEx, the data is accessed directly from the web, but the GTEx data has to be downloaded by the user and point to it when calling the script.

Usage: ./prepare_regions.sh -G <path to GTEx file> -o <Output folder>

For more information: ./prepare_regions.sh -h

Generating the Phred-scaled Eigen scores:

... completing soon...

Adjusting the config file used by the pipeline:

Update the location of the score files, linked feature file, path to bigWigTools etc. in the config.txt file. The script checks the existence of these files, and if a test fails, the script exits.

About

A collection of scripts to run genome-wide burden testing using MONSTER.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 59.6%
  • Perl 40.4%