Looking at this notebook post we generalized
cbai_transcriptome_v2.0.fasta
could be separated into
cbai_transcriptome_v2.1.fasta. + hemat_transcriptome_v2.1.fasta
However number of sequences is
cbai_transcriptome_v2.0.fasta (1.4M)
could be separated into
cbai_transcriptome_v2.1.fasta (237k) + hemat_transcriptome_v2.1.fasta (30k)
Q: should there be this big a "loss" of sequences given the approach?