Skip to content

Computes per-sample coverage from multiple BAM, CRAM, or bedGraph files, ensuring each genomic position is counted only once per sample. Chromosomes are processed in parallel to maximize throughput, producing a single tabix-indexed bedGraph.gz file.

License

Notifications You must be signed in to change notification settings

dpuiu/samplecov

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samplecov


Purpose

Compute sample coverage from multiple input BAM, CRAM, or bedGraph files.

Repository: samplecov
Related project: tiebrush


Overview

samplecov is a lightweight toolkit designed to compute per-sample coverage from multiple CRAM/BAM/bedGraph files. It outputs compressed and indexed bedGraph files suitable for further analysis or visualization.


Example Usage

# Compute per-sample coverage for Tissue_1 
samplecov.sh            -o Tissue_1.sample.bedGraph.gz Tissue_1/*.cram

# Use a common reference for Tissue_2 (if all samples used the same reference)
samplecov.sh -r ref.ids -o Tissue_2.sample.bedgGaph.gz Tissue_2/*.cram 

# Merge sample coverage across multiple tissues
samplecov.sh -r ref.ids -o Tissues.sample.bedGraph.gz -p 16 Tissue_*.sample.bedGraph.gz

Input Files

Sample/Tissue files:

 *.bam       - BAM  alignemnt files  
 *.cram      - CRAM alignemnt files  
 *.gz        - bedGraph coverage files  

ref.ids
A file containing a list of reference regions (e.g., from samtools faidx).
Use this file if all input files were aligned to the same reference.
Each line can be:
A chromosome (chr1)
A chromosome region (chr1:100000-200000)

Output Files

Tissues file:
*.gz - Compressed bedGraph file with total sample coverage

Example:

$ zcat Tissues.sample.bedGraph.gz  | head
  chr1	9999	10003	1	# chr1:9999-10003  region covered by a single sample(multiple reads?)  
  chr1	10003	10004	3	# chr1:10003-10004 region covered by 3 samples  
  chr1	10004	10010	5	# chr1:10004-10010 region covered by 5 samples

Requirements

The following tools must be installed and available in your system $PATH:

To install most of these on a Debian-based system:

sudo apt update
sudo apt install samtools tabix parallel coreutils pypy3 python3

Links

About

Computes per-sample coverage from multiple BAM, CRAM, or bedGraph files, ensuring each genomic position is counted only once per sample. Chromosomes are processed in parallel to maximize throughput, producing a single tabix-indexed bedGraph.gz file.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •