Skip to content

Force insertions to align with codon boundaries #379

@donkirkby

Description

@donkirkby

In the 09-Dec-2016.M01841 run, sample PNL4-3-1-V3LOOP_S73 reported a strange consensus sequence.

After digging into the problem, we found that a six-base insertion wasn't aligned with the codon boundaries. Here's a portion of the HIV1B-env reference we use, with the usual insertion marked with dashes:

AGTATACATATA------GGACCAGGG

The amino acid equivalent is:

SIHIGPG

Here's a portion of one of the sample's reads, as bowtie2 aligned it with the reference below:

AGTATCCGTATCCAGAGGGGACCAGGG
AGTATACATAT------AGGACCAGGG

You can see that the insertion is base to the left of its usual position. That pulls the G from the right side of the insertion over to the left, and is translated as the following amino acid sequence:

SIRMGPG

One possible solution: if an insertion's length is a multiple of three, then force it to align with the codon boundaries. For now, just choose the closest codon boundary.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions