-
Notifications
You must be signed in to change notification settings - Fork 11
Description
In the 09-Dec-2016.M01841 run, sample PNL4-3-1-V3LOOP_S73 reported a strange consensus sequence.
After digging into the problem, we found that a six-base insertion wasn't aligned with the codon boundaries. Here's a portion of the HIV1B-env reference we use, with the usual insertion marked with dashes:
AGTATACATATA------GGACCAGGG
The amino acid equivalent is:
SIHIGPG
Here's a portion of one of the sample's reads, as bowtie2 aligned it with the reference below:
AGTATCCGTATCCAGAGGGGACCAGGG
AGTATACATAT------AGGACCAGGG
You can see that the insertion is base to the left of its usual position. That pulls the G from the right side of the insertion over to the left, and is translated as the following amino acid sequence:
SIRMGPG
One possible solution: if an insertion's length is a multiple of three, then force it to align with the codon boundaries. For now, just choose the closest codon boundary.