Skip to content

Retrieving the entire query sequence from a blast, not just the 'local' aligned HSP  #2

@vmkalbskopf

Description

@vmkalbskopf

Hi there

I am using PyBlast to find similar sequences (obviously) in a different genome. None of these are model organisms.
The queries are fasta files, each with the same gene but from different strains. So I have about 12 genes and 10 strains.
I need to get the length of the entire query, not just the length of the part of the query that aligned to the database. Do you know if that's accessible?

If not, I will have to blast each sequence in each fasta file separately, which will be slower than blasting the whole file at a time.
This is what it looks like now:

for q in qdir.glob('*.fasta'):
    bcl = BCLine6("blastn", query=q,
    subject=db, word_size=11, evalue=0.01, outfmt="evalue sstrand")
    res = bcl.run(ncore=8, quiet=True)
    print(f'query length = {len(q.seq)}')

But of course q is not the actual sequence record, but a file name.
And qlen is the length of the aligned query, not the length of the whole query sequence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions