Skip to content

Enhance the FORMAT column of merge subcommand. #58

@xiucz

Description

@xiucz

Hi,
I think people are looking for SV merge tools often, and after testing many SV merge tools, such as SURVIVOR, svimmer, SVanalyzer, svtools, truvari, bcftools, and many other tools, I find SVDB is the best tool to merge SV vcf files from different SV callers! It use the set and priority strategy to combine SV events, which is similar to another tool CombineVariants (suitable for small variants).

When I want to merge three vcf from manta, lumpy and svaba, I find the result vcf only contains two field

#svdb merged file
chr13	32913545	MantaDEL:1:279:280:0:0:0:manta|5:lumpy	T	<DEL>	.	MaxDepth	END=32918007;SVTYPE=DEL;SVLEN=-4462;CIPOS=0,2;CIEND=0,2;HOMLEN=2;HOMSEQ=CA;STRANDS=+-:436;CIPOS95=0,0;CIEND95=0,0;SU=436;PE=0;SR=436;VARID=5:lumpy;set=filterInmanta-lumpy;FOUNDBY=2;manta_CHROM=MantaDEL_1_279_280_0_0_0|chr13;lumpy_CHROM=5|chr13;manta_POS=MantaDEL_1_279_280_0_0_0|32913545;lumpy_POS=5|32913547;manta_QUAL=MantaDEL_1_279_280_0_0_0|.;lumpy_QUAL=5|0.00;manta_FILTERS=MantaDEL_1_279_280_0_0_0|MaxDepth;lumpy_FILTERS=5|.;manta_SAMPLE=MantaDEL_1_279_280_0_0_0|QYQ_zuzhi|PR:0:0|SR:0:0;lumpy_SAMPLE=5|QYQ_zuzhi|GT:./.|SU:436|PE:0|SR:436|GQ:.|SQ:.|GL:.|DP:0|RO:0|AO:0|QR:0|QA:0|RS:0|AS:0|ASC:0|RP:0|AP:0|AB:.;manta_INFO=MantaDEL_1_279_280_0_0_0|END:32918007|SVTYPE:DEL|SVLEN:-4462|CIPOS:0:2|CIEND:0:2|HOMLEN:2|HOMSEQ:CA;lumpy_INFO=5|SVTYPE:DEL|SVLEN:-4463|END:32918010|CIPOS:0:0|CIEND:0:0|CIPOS95:0:0|CIEND95:0:0|SU:436|PE:0|SR:436;svdb_origin=manta|lumpy	PR:SR	.,.:.,.	0,0:0,0

#lumpy raw file
chr13	32913547	5	N	<DEL>	0.00	.	SVTYPE=DEL;SVLEN=-4463;END=32918010;STRANDS=+-:436;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=436;PE=0;SR=436	GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB	./.:436:0:436:.:.:.:0:0:0:0:0:0:0:0:0:0:.

#manta raw file
chr13	32913545	MantaDEL:1:279:280:0:0:0	T	<DEL>	.	MaxDepth	END=32918007;SVTYPE=DEL;SVLEN=-4462;CIPOS=0,2;CIEND=0,2;HOMLEN=2;HOMSEQ=CA	PR:SR	0,0:0,0

The SVDB merged the FORMAT field, it is trimmed for some reason.

PR: SR	.,.:.,.	0,0:0,0

, I think

  1. the key of the FORMAT field should be filled by the tags;
  2. the values of the FORMAT field should be the same length as the numbers of the callers(here, 3 callers, 3 columns).
    If one SV event is called by 3 callers, then the FORMAT should contain all the information from the 3 callers? as the INFO field does.
    An example,
GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB:PR:SR ./.:436:0:436:.:.:.:0:0:0:0:0:0:0:0:0:0:.:.:. .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:0,0:0,0 .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.,.:.,.

Here is the command line:

#version SVDB-2.6.4
~/bin/svdb --merge --vcf tumorSV.vcf:manta lumpy.gt.vcf:lumpy svaba.unfiltered.sv.vcf:svaba  --priority svaba,manta,lumpy > svdb.3.vcf

Best,
xiucz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions