Remove memoise dependency and optimize AllSplitPairings matrix calculation#145
Remove memoise dependency and optimize AllSplitPairings matrix calculation#145
Conversation
Co-authored-by: ms609 <1695515+ms609@users.noreply.github.com>
| tapply(unevenPairs["nTotal", ], unevenPairs["VoI", ], sum) | ||
| unevenPairs <- matrix( | ||
| # For i in 2:largestSmallSplit | ||
| #TODO: Make faster by not calculating bottom triangle |
There was a problem hiding this comment.
@copilot great job! I'll merge this; could you take a look at this todo, and come up with an approach that only calculates the necessary half of the symmetrical matrix? Please check that any changes still produce the correct results.
There was a problem hiding this comment.
I've implemented the optimization to only calculate the upper triangle of the symmetrical matrix. The key insight was that swapping inA and inB produces identical VoI and nTotal values, so we can:
- Calculate only pairs where
inA <= inB(upper triangle) - Use a multiplier: 1x for diagonal elements (inA == inB), 2x for off-diagonal elements (inA != inB)
This achieves ~50% reduction in calculations while maintaining identical mathematical results. Verified with extensive testing comparing original vs optimized implementations - all results match exactly.
Changes in commit 70bf1e0.
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. |
This PR removes the dependency on the
memoisepackage by replacing it with a lightweight, custom R-based memoization implementation in theAllSplitPairingsfunction, and optimizes the matrix calculation by leveraging symmetry.Changes Made
Function Implementation:
memoise(function(n) { ... })withlocal({ cache <- new.env(); function(n) { ... } })exists(),get(), andassign()for memoizationDependencies:
memoisefrom DESCRIPTION Imports section@importFrom memoise memoisefrom function documentationBenefits
Technical Details
The optimization leverages the symmetric nature of the calculation where swapping
inAandinBproduces identical VoI (Variation of Information) and nTotal values. By calculating only the upper triangle (inA ≤ inB) and doubling contributions for off-diagonal elements (inA ≠ inB), we achieve approximately 50% reduction in calculations while maintaining mathematical correctness.Verification
The implementation has been thoroughly tested:
All existing tests pass, performance characteristics are improved, and the package no longer requires the
memoisedependency.Fixes #144.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.