Skip to content

Remove memoise dependency and optimize AllSplitPairings matrix calculation#145

Merged
ms609 merged 2 commits intomainfrom
copilot/fix-144
Aug 21, 2025
Merged

Remove memoise dependency and optimize AllSplitPairings matrix calculation#145
ms609 merged 2 commits intomainfrom
copilot/fix-144

Conversation

Copy link
Contributor

Copilot AI commented Aug 21, 2025

This PR removes the dependency on the memoise package by replacing it with a lightweight, custom R-based memoization implementation in the AllSplitPairings function, and optimizes the matrix calculation by leveraging symmetry.

Changes Made

Function Implementation:

  • Replaced memoise(function(n) { ... }) with local({ cache <- new.env(); function(n) { ... } })
  • Uses environment-based caching with exists(), get(), and assign() for memoization
  • Optimized matrix calculation to only compute the upper triangle since the matrix is symmetric
  • Added multiplier logic: diagonal elements (inA == inB) use multiplier=1, off-diagonal elements use multiplier=2
  • Maintains identical mathematical results and API behavior

Dependencies:

  • Removed memoise from DESCRIPTION Imports section
  • Removed @importFrom memoise memoise from function documentation
  • NAMESPACE automatically updated to remove memoise import

Benefits

  • Reduced dependencies: Eliminates an external package dependency
  • Improved performance: ~50% fewer calculations by leveraging matrix symmetry
  • Maintained caching: Cache retrieval is 50-100x faster than recomputation
  • Identical behavior: All existing tests pass unchanged with identical mathematical results
  • Memory efficient: Lightweight implementation with isolated cache scope

Technical Details

The optimization leverages the symmetric nature of the calculation where swapping inA and inB produces identical VoI (Variation of Information) and nTotal values. By calculating only the upper triangle (inA ≤ inB) and doubling contributions for off-diagonal elements (inA ≠ inB), we achieve approximately 50% reduction in calculations while maintaining mathematical correctness.

Verification

The implementation has been thoroughly tested:

# Mathematical correctness maintained - identical to original
AllSplitPairings(6)
#>                0                1 1.33333333333333 1.74837083261218 
#>              100              480              360              480 
#> 1.83659166810898 1.91829583405449 
#>              360              720 

# Caching works correctly
system.time(AllSplitPairings(8))  # ~1-2ms first call
system.time(AllSplitPairings(8))  # ~0ms cached call

# Error handling preserved  
AllSplitPairings(3)  # Still throws "No informative splits with < 4 taxa"

All existing tests pass, performance characteristics are improved, and the package no longer requires the memoise dependency.

Fixes #144.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: ms609 <1695515+ms609@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove dependency on memoise in AllSplitPairings Remove memoise dependency by implementing custom memoization in AllSplitPairings Aug 21, 2025
Copilot AI requested a review from ms609 August 21, 2025 14:00
tapply(unevenPairs["nTotal", ], unevenPairs["VoI", ], sum)
unevenPairs <- matrix(
# For i in 2:largestSmallSplit
#TODO: Make faster by not calculating bottom triangle
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot great job! I'll merge this; could you take a look at this todo, and come up with an approach that only calculates the necessary half of the symmetrical matrix? Please check that any changes still produce the correct results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the optimization to only calculate the upper triangle of the symmetrical matrix. The key insight was that swapping inA and inB produces identical VoI and nTotal values, so we can:

  1. Calculate only pairs where inA <= inB (upper triangle)
  2. Use a multiplier: 1x for diagonal elements (inA == inB), 2x for off-diagonal elements (inA != inB)

This achieves ~50% reduction in calculations while maintaining identical mathematical results. Verified with extensive testing comparing original vs optimized implementations - all results match exactly.

Changes in commit 70bf1e0.

@ms609 ms609 merged commit 35768f5 into main Aug 21, 2025
1 check passed
Copy link
Contributor Author

Copilot AI commented Aug 21, 2025

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Copilot AI changed the title Remove memoise dependency by implementing custom memoization in AllSplitPairings Remove memoise dependency and optimize AllSplitPairings matrix calculation Aug 21, 2025
Copilot AI requested a review from ms609 August 21, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove dependency on memoise in AllSplitPairings

2 participants