Hi there
I am using PyBlast to find similar sequences (obviously) in a different genome. None of these are model organisms.
The queries are fasta files, each with the same gene but from different strains. So I have about 12 genes and 10 strains.
I need to get the length of the entire query, not just the length of the part of the query that aligned to the database. Do you know if that's accessible?
If not, I will have to blast each sequence in each fasta file separately, which will be slower than blasting the whole file at a time.
This is what it looks like now:
for q in qdir.glob('*.fasta'):
bcl = BCLine6("blastn", query=q,
subject=db, word_size=11, evalue=0.01, outfmt="evalue sstrand")
res = bcl.run(ncore=8, quiet=True)
print(f'query length = {len(q.seq)}')
But of course q is not the actual sequence record, but a file name.
And qlen is the length of the aligned query, not the length of the whole query sequence.