-
|
Dear cuQuantum Maintainers, This is a very general question regarding cuQuantum integration with other libs. It was mentioned by @leofang in an earlier discussion thread that Nvidia has internal code that integrates cuQuantum with Pytorch and the same could be done with JAX. Have any internal benchmarks been done on the speed difference between these two approaches at all? Is one likely to gain a lot from using JAX + cuQuantum on the speed front in comparison with Pytorch + cuQuantum, especially with cuStateVec? Or would the main bottleneck of state vec simulation already be alleviated mainly by cuQuantum the speed difference would be minimal between the two approaches? Thanks in advance for any insights on this. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
|
Hi @wcqc Can you clarify what is meant by "integration" in this context? Can you provide an example workload utilizing cuQuantum and another of these libraries? It would be useful to know how elements of cuQuantum's software would collaborate with other libraries to better steer any discussion regarding expectations surrounding performance. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the wrong link, indeed discussion/72 was the correct thread.
if assumptions (1, 2) above are broken, does it mean we get less benefit from cuQuantum (as it currently stands) due to other induced overheads as a result of assumptions (1, 2) no longer hold?
The Qiskit example is really useful here since AFAIK it doesn't use JAX. Let's suppose we implement a hybrid model, where all the assumptions (1, 2, 3, 4) are valid, in Qiskit and another one in JAX (+cuQuantum) taking full advantage of JIT, XLA etc., as a rough guidance, do you see major "speed" advantages in taking the JAX route compared with Qiskit? We fully appreciate the caveat that the best comparison could only be obtained with coding up models and benchmarking them, but any "rough" insights are much appreciated at this stage. |
Beta Was this translation helpful? Give feedback.
Regarding (1, 2), the more mingled the model is, the more time you will spend in cuQuantum (say custatevec) calculating intermediate gradient data in the backward pass. The more the model looks like a traditional circuit, the more opportunities you will have to optimize and pipeline the circuit gradient with fewer API calls and memory transactions.