Implementation of soft-clipping and CIGAR computation#25
Implementation of soft-clipping and CIGAR computation#25haghshenas wants to merge 23 commits intoCOMBINE-lab:developfrom
Conversation
|
I looked at the four commits and all the changes look great! I tested the soft-clipping with a couple of small examples and I found a case with the For these targets, when running the command Also in the case of running without the I have some fixes which I can commit, but we can also have a discussion before going forward. |
1868d67 to
d7f38a9
Compare
src/PufferfishAligner.cpp
Outdated
| // aconf.allowOverhangSoftclip = mopts->allowOverhangSoftclip; | ||
| // aconf.allowSoftclip = mopts->allowSoftclip; | ||
| aconf.computeCIGAR = (mopts->computeCIGAR and !mopts->noOutput); | ||
| aconf.endBonus = mopts->endBonus; |
There was a problem hiding this comment.
We might need to have endBonus in the aconf struct anymore.
src/PufferfishAligner.cpp
Outdated
| // aconf.allowOverhangSoftclip = mopts->allowOverhangSoftclip; | ||
| // aconf.allowSoftclip = mopts->allowSoftclip; | ||
| aconf.computeCIGAR = (mopts->computeCIGAR and !mopts->noOutput); | ||
| aconf.endBonus = mopts->endBonus; |
There was a problem hiding this comment.
Again, we might need to have endBonus in the aconf struct anymore.
There was a problem hiding this comment.
I have removed it from here as well.
|
Based on my review and the conversations we had about the changes before, everything looks good to me. |
…d overhang cases happening together
85c702c to
cf54093
Compare
Here is a short summary of the changes:
end_bonusvalue from 10 to 5KSW_EZ_EXTZ_ONLYflag for KSW2 extension alignments (otherwise it always backtrack from the end of both sequences)CIGARGenerator's variable type to enable computation of CIGAR for cases with overlapping MEMs (uint32_ttoint32_t)--allowSoftclipand--allowOverhangSoftclipwith--computeCIGAR--debug; if not used, no debug information will be printedWith these changes I compared the output of PuffAligner with Minimap2 for single-end reads. This comparison showed that generally, the alignment is computed correctly in most of the cases.
The next essential step: In some cases the alignment is wrong. The reason is that the extension is stopped (ez->stopped==1) and therefore, the score for reaching the end of query is not calculated properly. As a result, it might be the case that score for reaching end of query is incorrectly higher than the maximum scoring prefix alignment. Here is an example:
It seems that extension stopping was not written with soft-clipping and computation of CIGAR strings in mind, so this requires a fix.
I will check if the original KSW2 extension fixes this issue and if it does, we can discuss about whether it's worth to have the stopping feature or not.
So next items:
end_bonusvalue