Skip to content

Conversation

@adamant-pwn
Copy link
Contributor

Hi @jermp! I'm updating sshash submodule in Metagraph, and these are the fixes that we needed to make it run properly on our end. I wanted to also make a PR upstream with them, as they mostly generically useful:

  • Use target_include_directories instead of include_directories so that it's also propagated to external projects when using add_subdirectory.
  • Check for defined(__AVX2__) instead of defined(__x86_64__) to better match the intent of the check, and be more compatible with cross-compilation.
  • Expose buckets and minimizers in the dictionary.
  • Add fake canonicalize_basepair_reverse_map to make sure sshash methods are still compatible with aa_uint_kmer_t
  • Add inline to some functions to suppress certain Clang warnings

I also wanted to note that you have a bunch of table_size += 1 checks when table_size is a power of two in the pthash submodule that is used in the current master branch of sshash. Unfortunately, checking for power of two is not enough in those cases, and the check should actually be for whether table_size is even, as xoring all numbers at once with something will preserve or flip parity of all numbers at once, so if they're unevenly distributed, it would be impossible to find a perfect matching. This might be irrelevant to you by now, as I see that you no longer have this check in the newest version of pthash that wasn't propagated to sshash's master yet.

@adamant-pwn
Copy link
Contributor Author

adamant-pwn commented Oct 5, 2025

By the way, I also noticed that you require build_config.num_threads to be a power of 2. Is it needed anywhere other than parallel_sort.hpp? And do you feel strongly about keeping this constraint? I think it's generally a good practice to not put such restrictions, and if such change would be fine with you, I would also include something to allow arbitrary number of threads again. I presume it should be just a matter of something like ratschlab@673f3e5.

@adamant-pwn
Copy link
Contributor Author

Ok, I actually added the fixes for parallel_sort.hpp in last two commits as well. Because besides supporting odd number of threads, it is also prudent to add a check for data.size() < num_threads case, as otherwise it becomes annoying for small unit tests 😅

@jermp
Copy link
Owner

jermp commented Oct 5, 2025

Hey Oleksandr! Thanks for these miscellaneous fixes :)

Regarding the num. of threads, good to have it general, thanks! I was somehow lazy to make it general.

For the table size checks in PTHash, the new master version of PTHash does not have them anymore as now we don't use the mod anymore. We use a "fast range" reduction (fixed-point multiplication) + mixing. This solves the problem.
Oh, I haven't updated that version in SSHash but I'm working to a new release of SSHash as well in the branch bench, where I have improved query time a lot. Feel free to have a look there. It's work in progress and I have to clean some parts.

I will merge your changes into master and reflect those in the bench branch as well.

@adamant-pwn
Copy link
Contributor Author

adamant-pwn commented Oct 5, 2025

I just made another commit. It fixes some issues with multi-threading that we were getting when running our tests on large number of threads. In particular, you had the following recurrent pattern:

  • You have $N$ items that you want to process on $T$ threads;
  • You split them in blocks of size at most $\lceil N / T \rceil$ and hope that only the last block may have smaller size;
  • On practice, roughly up to $T^2/N$ blocks can end up empty, and in your code they often get assigned ranges $[L, R)$ where $L &gt; R$ leading to errors in standard library functions, or infinite loops like for(int i = L; i != R; i++)

Using L < R as a break condition helps a lot with the third point, but I also tried to just not spawn threads with empty tasks at all, as they sometimes may lead to other complications (e.g. std::sort(L, R) is invalid in parallel_sort).

I also removed the strict assert in build_sparse_index, because the expectation that the number of offsets will be exactly num_threads+1 seems to be a heuristic, rather than mathematical fact, and on practice you can get less offsets if you "overshoot" in estimates on where they are placed, esp. with very large number of threads.

@jermp
Copy link
Owner

jermp commented Oct 6, 2025

Regarding this last point, I think you're right. I was assuming that $N &gt; T^2$ which has been always true for me since $T$ is usually small. But I see the error can happen in small unit tests, where $N$ can be tiny and $T$ very large.

So the question is, what are the values of $N$ and $T$ in your tests? :)

@adamant-pwn
Copy link
Contributor Author

adamant-pwn commented Oct 6, 2025

Particular failing configuration was with $T=10$, and $N$ on the same order of magnitude, as in most cases those inputs are hand-crafted for various sanity checks 😅

It also very annoyingly failed when running all tests at once, but was passing when filtering individual test suites out with gtest_filter, so it is possible the number of threads was contaminated from another test suite on our end that didn't reset it back to 1 for other tests. Nevertheless, I thought it still warrants a fix upstream for what happens when $N$ is not much greater than $T^2$.

@jermp
Copy link
Owner

jermp commented Oct 6, 2025

Yes, I agree. Thanks for the fix :) I admit that I did not bother trying with tiny instances like N = 10 or so,
but It's good to also check those cases.

@jermp jermp merged commit 5380d4d into jermp:master Oct 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants