Skip to content

Fixed bug: Updated parse_gtdbtk.Snakefile#19

Open
shiraz-shah wants to merge 1 commit intoRussel88:mainfrom
shiraz-shah:main
Open

Fixed bug: Updated parse_gtdbtk.Snakefile#19
shiraz-shah wants to merge 1 commit intoRussel88:mainfrom
shiraz-shah:main

Conversation

@shiraz-shah
Copy link

MAGinator had a bug because of the default mmseqs gene clustering mode used. Due to this bug, gene fragments from incomplete assemblies would end up as their own gene clusters. This would inflate the total number of gene clusters, with unforeseen downstream consequences for signature gene selection and abundance estimation.

We have fixed this bug by changing the mmseqs clustering mode to coverage mode 1, so gene fragments do not end up as separate clusters, but instead get merged with their full-length counterparts.

In addition, the mmseqs clustering workflow has been changed from easy-linclust to easy-cluster, because the latter is fast enough (20 minutes for a deep 500-sample metagenome data set), while easy-linclust employs a number of heuristics to improve speed at the cost of accuracy.

Changed mmseqs gene clustering to coverage mode 1, so gene fragments do not end up as separate clusters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant