-
Notifications
You must be signed in to change notification settings - Fork 5
Updated DNA blocks property in Library class. #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… the blocks from different parents were being assigned different overhangs when I ran generate_libraries function when the amino acids in different variants were different at the breakpoints, leading to incompatible assemblies. The updated function takes into account the amino acids at the breakpoints for all the parents, so all the blocks for all the homologues can be made cross-compatible
|
Good catch! Thanks for the PR. I haven't looked at this code for awhile, so I'll need a bit to remember how it works. There seems to be a lot of changes that don't seem immediately relevant to the bug. Can you explain why these changes are needed in this specific commit? Regardless, it looks like this code needs some TLC with four years of hindsight/experience. I'll get around that eventually, but let me know if you'd like to chat about how you use this package. |
|
Hiya, thanks for getting back to me so quickly! Appreciate this is a slightly older codebase, but I found the idea of GG compatible schema fragments super useful so wanted to send over my fix when I noticed the issue with the overhangs. The main change was making sure the overhangs were always compatible. Basically the original code was determining the overhang codon based on the amino acid of each parent sequence individually. This was leading to issues if the parents have different amino acids at a breakpoint, they were then generating different, incompatible overhangs for the same junction. I spotted this as it was the case for the assembly I was testing, and the output FASTA with the gene blocks was making a lot of fragments that were incompatible for assembly. The updated function now searches for a single “design” codon that works for that junction across homologues. It checks all possible amino acids from all parent sequences at that specific breakpoint position until it finds one whose codons can satisfy the overhang pattern. This means every DNA block for that junction gets the same, compatible overhang. Thank you for the kind offer! I'm finding the package very useful. I’m trying to link it through to our original DNA sequences and some scripts we have for designing GoldenGate primers to save on ordering DNA, hence why it’s so useful compared to a lot of SCHEMA stuff which just assumes you want to order whole genes.. |
|
Thanks for the explanations! IIRC, shouldn't the overhangs already be determined by the time you call I'll keep investigating. If you have the time, could you submit a minimal reproducible example? Either in a bug report issue or attached to this PR. The extraneous changes make a lot of sense, but they're distracting from the fix itself. Please reduce your PR to the minimum change necessary. I'd be happy to accept your refactoring PRs if you're willing to submit them later, but I'll work on cleaning up the code anyways. Alternatively, feel free to make whatever changes you'd like to your fork and make sure to double-check the output! The newer version should be much more understandable. |
I found issues with some sequences where the blocks from different parents were being assigned different overhangs when I ran the generate_libraries function if the amino acids in different variants were different at the breakpoints, leading to incompatible assemblies. This was the case when I ran the example in the Quickstart Guide on my sequences.
The updated version takes into account the amino acids at the breakpoints for all the parents, so all the blocks for all the homologues now have matching overhangs.
I have added an assembly step to the test_dna_blocks unit test, to help confirm that the generated sequences match their parents if all the blocks from a single parent are re-assembled.