In Sunny.jl, calculating the magnon dynamic structure factor requires three temporary arrays (corrbufq, Avec_prefq and Avecq). Our CUDA extension currently uses CuDynamicSharedArray for this scratch space.
https://github.com/MAIQMag/Sunny.jl/blob/36687014d55e6f6b07dfbc1b067530832e0c802f/ext/CUDAExt/SpinWaveTheory/DispersionAndIntensities.jl#L45-L92
Instead of duplicating this kernel for every JuliaGPU backend, we're interested in a portable implementation, possibly an extension of JACC.shared?