Fixed bug: Updated parse_gtdbtk.Snakefile by shiraz-shah · Pull Request #19 · Russel88/MAGinator

shiraz-shah · 2025-01-13T21:16:19Z

MAGinator had a bug because of the default mmseqs gene clustering mode used. Due to this bug, gene fragments from incomplete assemblies would end up as their own gene clusters. This would inflate the total number of gene clusters, with unforeseen downstream consequences for signature gene selection and abundance estimation.

We have fixed this bug by changing the mmseqs clustering mode to coverage mode 1, so gene fragments do not end up as separate clusters, but instead get merged with their full-length counterparts.

In addition, the mmseqs clustering workflow has been changed from easy-linclust to easy-cluster, because the latter is fast enough (20 minutes for a deep 500-sample metagenome data set), while easy-linclust employs a number of heuristics to improve speed at the cost of accuracy.

Changed mmseqs gene clustering to coverage mode 1, so gene fragments do not end up as separate clusters.

Update parse_gtdbtk.Snakefile

0833d3b

Changed mmseqs gene clustering to coverage mode 1, so gene fragments do not end up as separate clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed bug: Updated parse_gtdbtk.Snakefile#19

Fixed bug: Updated parse_gtdbtk.Snakefile#19
shiraz-shah wants to merge 1 commit intoRussel88:mainfrom
shiraz-shah:main

shiraz-shah commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shiraz-shah commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant