Skip to content

Long lists of candidate fusions when read length exceeds 100 bp #228

@Rmulet

Description

@Rmulet

Hi,

First of all, thanks a lot for creating this awesome tool. It has proven to be extremely valuable to detect some hard fusions, like IGH::CRLF2 and others involving the IGH locus.

However, FusionCatcher sometimes reports extremely long lists of fusion genes, ranging from 500 to the thousands, most of which are false positives. For reference, normally it yields 40-50 results in my hands. To make matters worse, when this happens the execution of the program is extremely slow, probably because it has to run BLAT for many more fusion candidates. After some experimentation, I've come to the conclusion that this behavior seems to be a consequence of sequencing with 151 bp cycles, instead of 101 bp as we typically do.

Indeed, when I trim these longer reads to 101 bp the lists go down from the 1000s to 40-50 again. Apparently, this has nothing to do with adapter content, as removing only adapters (which FusionCatcher does on its own anyways) has no impact on the results. By default, I normally supply raw, unprocessed reads as specified in the user's manual.

I understand that longer reads should increase sensitivity, but the increase seems disproportionate. Besides, most of the results seem false positives, generally with only few supporting reads, and not found by any other tool. Is there any option to prevent this sort of behaviour? Is there anything I could be doing wrong? A temporary solution is to trim everything to 101, but it is a bit of a waste to throw away potentially informative data.

Best,

Roger

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions