On CUDA there are two versions of `rsqrt()` - [`rsqrt()`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#standard-functions) - [`__frsqrt_rn()`](https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/group__CUDA__MATH__INTRINSIC__SINGLE.html#_CPPv411__frsqrt_rnf) `rsqrt()` is accurate to 2 ULPs, while `__frsqrt_rn()` should be correctly rounded.