Skip to content

When to use --no_force_align #2

@sbliven

Description

@sbliven

It's unclear to me when to use the --no_force_align option to ProGraph. The README describes this as

do not force alignment of initial Methionine

What's the scientific motivation for skipping initial M by default?

I ask because of a potential bug in the interaction with the --repeat option, which matches the sequences to a T-Reks output alignment. These files reference sequence positions, so they cause an off-by-one error if the M was stripped.

I can think of several possible solutions:

  • Default to --no_force_align when the --repeat option is also specified
  • For each sequence, store a flag indicating whether it has been truncated. If so, account for that when reading in the repeats file
  • Be more permissive when verifying the FASTA/T-REKS alignment. Automatically recover from off-by-one errors in the coordinates. (This would have the side benefit of supporting malformed T-Reks files that used 0-based indexes rather than the correct 1-based positions.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions