Feature: Support LJA for genome size estimation in helper command#59
Closed
MrTomRod wants to merge 1 commit intorrwick:mainfrom
Closed
Feature: Support LJA for genome size estimation in helper command#59MrTomRod wants to merge 1 commit intorrwick:mainfrom
MrTomRod wants to merge 1 commit intorrwick:mainfrom
Conversation
This adds an optional `--assembler` argument to the `autocycler helper genome_size` command, allowing users to choose between Raven (default) and LJA for genome size estimation. LJA offers significantly faster performance for PacBio HiFi reads compared to Raven (approx. 10x faster in testing), making it a valuable alternative for large datasets. - Modified `src/main.rs` to parse the new `--assembler` flag. - Updated `src/helper.rs` to dispatch to `genome_size_raven` or `genome_size_lja` based on the argument.
Owner
|
Hi Thomas, Thanks for this! Some thoughts/observations:
I'm curious why you got much better speed performance with LJA than I did. What was your read set like? How deep? How long were the reads? How big was the genome? My inclination is to not merge this PR, since it adds a bit of complexity to the tool and I'm not sure it's needed. I think Raven is a good default choice (often faster than LJA in my tests), and If users really need faster genome size estimation, they can use LRGE. But am I missing something? Thanks, |
Contributor
Author
|
I didn't realize that LJA is sometimes slower than Raven. I never observed this, but I'm always using PacBio HiFi reads.
Feel free to reject the PR. I also tried LRGE and am using that now, too. Happy holidays! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR modifies the genome_size helper command to support LJA as an alternative assembler. Previously, the command hardcoded the use of Raven.
Motivation: LJA is significantly faster than Raven (though it may only be used for PacBio HiFi data?)
Usage Example:
Performance
Benchmarking on a test dataset shows LJA is approximately 6x faster in wall-clock time and 9x faster in CPU time compared to Raven:
Note: LJA support is currently limited to PacBio HiFi reads.
Changes
--assemblerargument toautocycler helper genome_size.--assembler ljais passed, it calls genome_size_lja.--assembler raven(or nothing) is passed, it defaults to genome_size_raven.