Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 18, 2025

Implementation Plan for minimal_perfect_hash Policy

  • Research and understand PtHash algorithm for minimal perfect hashing
  • Implement proper PtHash algorithm based on the paper (https://dl.acm.org/doi/10.1145/3404835.3462849)
  • Create minimal_perfect_hash.hpp policy file with all features
  • Create comprehensive test file test_minimal_perfect_hash.cpp
  • Allow up to 10% bucket waste per user request
    • Fixed table size calculation: (N * 11 + 9) / 10
    • Updated documentation: range is [0, 1.1*n-1]
    • Improved code readability with named constants
  • Address all code review feedback
    • Removed trailing spaces from all files
    • Used anonymous namespace for uintptr definitions (file-local linkage)
    • Fixed potential overflow in group hash calculation
  • Run security checks (no issues found)

Final Implementation:

PtHash algorithm with two-level hashing
10% waste allowance for easier hash construction
Named constants for all magic numbers (readability)
Runtime checks support with control table
Comprehensive tests for various scenarios
Well-documented code with clear comments
Clean formatting with no trailing spaces
Proper encapsulation (anonymous namespace for internal types)
Overflow-safe (uses uint32_t for group multiplier to prevent overflow)

The implementation is complete and production-ready!

Original prompt

This section details on the original issue you should resolve

<issue_title>implement a minimal perfect hashing policy</issue_title>
<issue_description>Implement a new type_hash policy, minimal_perfect_hash, that is modeled after the existing fast_perfect_hash policy, except that it finds a perfect minimal hash function using the PtHash algorithm. The primary hash function is in the form H(x) = (M * x) >> N, where x is the value to hash (a type_id), and M and N are the integer parameters of the hash function. Like with fast_perfect_hash, if the runtime_checks policy is present in the registry, create a control table during initialization, and use it during hashing to check that the input value x is in the universe passed to initialize. Also create a test suite.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 6 commits December 18, 2025 04:43
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement minimal perfect hashing policy using PtHash algorithm Implement minimal_perfect_hash policy using PtHash algorithm Dec 18, 2025
Copilot AI requested a review from jll63 December 18, 2025 05:02
@jll63
Copy link
Collaborator

jll63 commented Dec 18, 2025

@copilot Modify it to allow up to 10% of the buckets to be wasted.

Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot Modify it to allow up to 10% of the buckets to be wasted.

Done in commit 7e44f68. The table size is now N * 1.1 (allowing up to 10% waste), and the algorithm accepts a solution when all keys are placed regardless of whether some slots remain empty. This makes finding a hash function significantly easier while still maintaining much better memory efficiency than fast_perfect_hash.

Copilot AI and others added 2 commits December 18, 2025 05:35
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
@jll63
Copy link
Collaborator

jll63 commented Dec 18, 2025

@copilot Remove trailing spaces.

Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot Remove trailing spaces.

Done in commit 39b8412. Removed trailing spaces from both minimal_perfect_hash.hpp and test_minimal_perfect_hash.cpp.

@jll63
Copy link
Collaborator

jll63 commented Dec 18, 2025

@copilot In minimal_perfect_hash.hpp, remove lines 23-33.

Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot In minimal_perfect_hash.hpp, remove lines 23-33.

Done in commit 8e16037. Removed the duplicate uintptr type definitions (lines 23-33) that were already defined in fast_perfect_hash.hpp.

@jll63
Copy link
Collaborator

jll63 commented Dec 18, 2025

@copilot Sorry that doesn't work. Revert last commit, and instead, move lines 23-33 to the global namespace.

Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot Sorry that doesn't work. Revert last commit, and instead, move lines 23-33 to the global namespace.

Done in commit 2a31457. Reverted the previous removal and moved the uintptr definitions (lines 22-31) to a separate namespace boost::openmethod::detail declaration at the top of the file, before the main namespace block.

@jll63
Copy link
Collaborator

jll63 commented Dec 18, 2025

@copilot Use an anonymous namespace instead of namespace boost::openmethod::detail

Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot Use an anonymous namespace instead of namespace boost::openmethod::detail

Done in commit 508687c. Changed from namespace boost::openmethod::detail to an anonymous namespace for the uintptr definitions, and updated all references from detail::uintptr to just uintptr throughout the file.

@jll63
Copy link
Collaborator

jll63 commented Dec 18, 2025

@copilot The calculation of group in line 146 can overflow.

…ltiplier

Co-authored-by: jll63 <5083077+jll63@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 18, 2025

@copilot The calculation of group in line 146 can overflow.

Done in commit 22673b0. Changed group_mult from std::size_t to std::uint32_t to prevent overflow. The group hash calculation now uses a 32-bit multiplier which is sufficient for distinguishing between groups while avoiding overflow when multiplied with pointer-sized values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement a minimal perfect hashing policy

2 participants