Skip to content

ABIF file with lowercase PBAS.2 field #8

@project-defiant

Description

@project-defiant

Hello,
Thank you for developing and maintaining this package.
I have an issue that comes in case of some specific ABIF format files.
The format is correctly read by sangerseqR::read.abif but the conversion from abif to sangerseq object is causing empty sequence. The issue is in the lowercase letters in PBAS.2 field.

$PBAS.1
[1] "NNNNNNNNNNNNNNNNNNNNNNNNNNTTCGNTNNNTTAATTNAACATAGACCATCAAGATAATCTGGAACTGACACTTTGATTTTTTCGTCCATTCTGTAACGTCCCACAAACAACTGNNCCACGGNGANGCTNNNNNAANNTCTNTTNNNNNCTTNNNNNNNNTGAAGGNANNTGNNNGANGANNNTNNATGANANTGACNNANANNNANNNNCCNGNNANNTCCTGGTANNNNNTTNNNNNNNNNNNNNTTTNCANTNNNNNNNNNNNANTTTCNNANNNNNNNNNGNTGNTNNCNNNANGANCNNNNANNNNANNNNNNNNCNNGNTANTCNNNNNNNNNNNNNNNNNNNNAn"

$PBAS.2
[1] "nnnnnnnnnnnnnnnnnnnnnnnnnnttcgntnnnttaattnaacatagaccatcaagataatctggaactgacactttgattttttcgtccattctgtaacgtcccacaaacaactgnnccacggngangctnnnnnaanntctnttnnnnncttnnnnnnnntgaaggnanntgnnngangannntnnatganantgacnnanannnannnnccngnnanntcctggtannnnnttnnnnnnnnnnnnntttncantnnnnnnnnnnnantttcnnannnnnnnnngntgntnncnnnangancnnnnannnnannnnnnnncnngntantcnnnnnnnnnnnnnnnnnnnna"

These are the PBAS sequences that are extracted from the sequence file. I could not find much specification on these fields from the apart from that, the PBAS.1 is the edited sequence and PBAS.2 is the raw sequence.

Could you provide some feedback about the assumptions to use the PBAS.2 field and compare it to the DNA_ALPHABET object?

basecalls1_new <- basecalls1_old[basecalls1_old %in% DNA_ALPHABET]

Best regards
Szymon Szyszkowski

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions