-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hello,
Thank you for developing and maintaining this package.
I have an issue that comes in case of some specific ABIF format files.
The format is correctly read by sangerseqR::read.abif but the conversion from abif to sangerseq object is causing empty sequence. The issue is in the lowercase letters in PBAS.2 field.
$PBAS.1
[1] "NNNNNNNNNNNNNNNNNNNNNNNNNNTTCGNTNNNTTAATTNAACATAGACCATCAAGATAATCTGGAACTGACACTTTGATTTTTTCGTCCATTCTGTAACGTCCCACAAACAACTGNNCCACGGNGANGCTNNNNNAANNTCTNTTNNNNNCTTNNNNNNNNTGAAGGNANNTGNNNGANGANNNTNNATGANANTGACNNANANNNANNNNCCNGNNANNTCCTGGTANNNNNTTNNNNNNNNNNNNNTTTNCANTNNNNNNNNNNNANTTTCNNANNNNNNNNNGNTGNTNNCNNNANGANCNNNNANNNNANNNNNNNNCNNGNTANTCNNNNNNNNNNNNNNNNNNNNAn"
$PBAS.2
[1] "nnnnnnnnnnnnnnnnnnnnnnnnnnttcgntnnnttaattnaacatagaccatcaagataatctggaactgacactttgattttttcgtccattctgtaacgtcccacaaacaactgnnccacggngangctnnnnnaanntctnttnnnnncttnnnnnnnntgaaggnanntgnnngangannntnnatganantgacnnanannnannnnccngnnanntcctggtannnnnttnnnnnnnnnnnnntttncantnnnnnnnnnnnantttcnnannnnnnnnngntgntnncnnnangancnnnnannnnannnnnnnncnngntantcnnnnnnnnnnnnnnnnnnnna"
These are the PBAS sequences that are extracted from the sequence file. I could not find much specification on these fields from the apart from that, the PBAS.1 is the edited sequence and PBAS.2 is the raw sequence.
Could you provide some feedback about the assumptions to use the PBAS.2 field and compare it to the DNA_ALPHABET object?
sangerseqR/R/sangerseqmethods.R
Line 39 in 3664259
| basecalls1_new <- basecalls1_old[basecalls1_old %in% DNA_ALPHABET] |
Best regards
Szymon Szyszkowski