Skip to content

Add OpenMP Union Find#79

Open
WingCode wants to merge 7 commits intoquantumgizmos:mainfrom
WingCode:add-openmp-union-find
Open

Add OpenMP Union Find#79
WingCode wants to merge 7 commits intoquantumgizmos:mainfrom
WingCode:add-openmp-union-find

Conversation

@WingCode
Copy link

@WingCode WingCode commented Jun 8, 2025

closes #73

@quantumgizmos
Copy link
Owner

Thank you for the PR. Could you provide benchmarks to show the performance under different numbers of cores for surface code decoding?

Thanks
JR

@WingCode
Copy link
Author

WingCode commented Jun 9, 2025

@quantumgizmos
I have included the benchmarked test case using 'python_test/pcms/hx_surface_20.npz'

Here are the benchmark results:

Threads Time (µs)
1 48.493 µs
4 41.062 µs
8 39.989 µs

@quantumgizmos
Copy link
Owner

It's reassuring to see that the time is decreasing. However, I don't think timing over a single syndrome provides a realstic benchmark.

Could you time over a number of syndromes dervied from randomly generated error patterns? It would also be useful to show how the advantage of parallisation changes with different pysical error rates $p$.

Benchmaking procedure

  1. For p=0.01 sample >1000 bit strings
  2. Use the parity check matrix to calculate syndromes for each of the randomly generated bit strings.
  3. For thread_count={1,2,4,8} time how long it takes to decode all the syndromes using the UnionFind Decoder. Calculate the average time per decode call.
  4. Repeat for values of p between 0.01 and 0.09

@WingCode
Copy link
Author

@quantumgizmos I have enhanced the benchmark to incorporate your feedback. Running this benchmark for p=0.01 and 0.03. The p=0.05 on single thread is taking a quite bit of time to run... I will update once I have more info.

p 1 thread 2 threads 4 threads 8 threads
0.01 158 µs 173 µs 182 µs 183 µs
0.03 296 µs 269 µs 261 µs 256 µs

It's reassuring to see that the time is decreasing. However, I don't think timing over a single syndrome provides a realstic benchmark.

Could you time over a number of syndromes dervied from randomly generated error patterns? It would also be useful to show how the advantage of parallisation changes with different pysical error rates p .

Benchmaking procedure

  1. For p=0.01 sample >1000 bit strings
  2. Use the parity check matrix to calculate syndromes for each of the randomly generated bit strings.
  3. For thread_count={1,2,4,8} time how long it takes to decode all the syndromes using the UnionFind Decoder. Calculate the average time per decode call.
  4. Repeat for values of p between 0.01 and 0.09

@quantumgizmos
Copy link
Owner

Thanks for the new bencmarks. It seems like the parallisation overhead is slowing the decoder. Could you review sections where you've used the OMP critical directive to see whether further parallelisation is possible?

@WingCode
Copy link
Author

@quantumgizmos
The latest changes have noticeably boosted the benchmark—thanks for that! I’m seeing excellent single-thread gains, but the improvement becomes less clear as we scale to more threads. Whenever you have a moment, could you share where we could focus next to squeeze out additional multithreaded performance? Your seasoned perspective would help me zero in on the right spots :)

p 1 thread 2 threads 4 threads 8 threads
0.01 61 µs 58 µs 62 µs 55 µs
0.03 105 µs 103 µs 105 µs 105 µs

@WingCode
Copy link
Author

@quantumgizmos Also, just a gentle reminder that Monday was mentioned as the last day for reviewing open PRs. If there are any specific changes or clarifications you'd like me to make on this one, I’d be happy to take care of them promptly late Sunday. And if it’s helpful to keep iterating, I’d be more than happy to continue working on it if the issue gets assigned to me.

@quantumgizmos
Copy link
Owner

Hi @WingCode. Thank you for you work on this. So that you can be issued the UnitaryReward, could you make a comment on issue #73 . I can then assign the issue to you and the UnitaryFund will be in touch.

@quantumgizmos
Copy link
Owner

If you are interested on working on this further, we can set up a call to discuss. I think we will have to rethink the way the algorithm has been implemented from the bottom up. It doesn't look like OpenMP makes much of a difference with the current data structures.

@WingCode
Copy link
Author

If you are interested on working on this further, we can set up a call to discuss. I think we will have to rethink the way the algorithm has been implemented from the bottom up. It doesn't look like OpenMP makes much of a difference with the current data structures.

I'd be happy to continue working on this! Is there a preferred channel where I can reach out to you?

@quantumgizmos
Copy link
Owner

@WingCode Great! For future discussions, my email is joschka@roffe.eu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add OpenMP Support to Union-Find Decoder

2 participants