Skip to content

Conversation

@yhmtsai
Copy link
Contributor

@yhmtsai yhmtsai commented Feb 28, 2025

This PR creates cuSPARSE backend and add spmv implementation.
It follows the current design of gpu backend.
Compiling this target is required on the node with available Nvidia GPU due to Thrust. Otherwise, it will give Illegal instruction (core dumped). I do not dig into whether there is a way to force Thrust with cuda support on CPU-only system.

Should we use CCCL rather than Thrust?

@yhmtsai yhmtsai self-assigned this Feb 28, 2025
@yhmtsai yhmtsai requested a review from BenBrock March 3, 2025 08:11
@yhmtsai yhmtsai force-pushed the dev/yhmtsai/cusparse_spmv branch from b6bfd8f to d91aac3 Compare March 3, 2025 10:13
@yhmtsai yhmtsai force-pushed the dev/yhmtsai/cusparse_spmv branch 2 times, most recently from 9ef0f35 to f06d675 Compare March 20, 2025 17:28
@yhmtsai
Copy link
Contributor Author

yhmtsai commented Mar 20, 2025

@BenBrock I have updated/rebase the pr on the latest main branch. It reuse the example and test with Thrust.
also uses the std::allocator-like cuda_allocator.
If you can add the github action with nvidia gpu to check this pr, that will be great.

@BenBrock
Copy link
Collaborator

BenBrock commented Mar 20, 2025

@yhmtsai Thanks, I will try to add this to the PR by your morning Friday, although I'm a bit busy today.

As for using CCCL vs. Thrust, my understanding is that Thrust is now part of CCCL, and the usage is the same (<thrust/foo.h> headers). I believe unless we want to build with the upstream version instead of the release version installed with CUDA, there should be no difference. (I think you're doing the correct thing here.)

@BenBrock BenBrock mentioned this pull request Mar 21, 2025
BenBrock
BenBrock previously approved these changes Apr 17, 2025
Copy link
Collaborator

@BenBrock BenBrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. I'm happy to merge unless somebody from Nvidia (maybe @fbusato or @essex-edwards) wants to comment.

set(SPBLAS_GPU_BACKEND ON)
find_package(CUDAToolkit REQUIRED)
target_link_libraries(spblas INTERFACE CUDA::cudart CUDA::cusparse CUDA::cublas)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DSPBLAS_ENABLE_CUSPARSE")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we're just compiling with the host compiler, which is fine, since all of our examples and tests can use the host compiler for CUDA. If a user application is compiling with CUDA (nvcc) instead of a host compiler, we also want to add the correct linking and compile flags. I think the only piece of this that's missing is adding -DSPBLAS_ENABLE_CUSPARSE to CMAKE_CUDA_FLAGS . Can @fbusato or @essex-edwards comment on what the right thing to do here is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nvcc does not support C++23 but I am not sure how complete support from nvhpc.
if we would like to use nvcc there we will need to use earlier standard.

@yhmtsai yhmtsai mentioned this pull request Apr 18, 2025
6 tasks
@yhmtsai
Copy link
Contributor Author

yhmtsai commented May 7, 2025

@BenBrock I have rebased it and it needs your apporval again. I think we can merge this now and continue the other pr.

@BenBrock BenBrock self-requested a review May 9, 2025 17:09
Copy link
Collaborator

@BenBrock BenBrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR.

One thing we do need to reconcile soon is:

  1. Whether passing in a state object is optional. It's currently optional in all the non-GPU vendor implementations, and I think it should be optional, particularly for simple operations like SpMV.

  2. What the naming is of the state object. We (I believe) in our last meeting that we want one state object, so something like operation_state_t. This is what's currently implemented in all the non-GPU vendor implementations. This doesn't mean we can't have an spmv_state_t object if you want to use that in the implementation, but it needs to be type erased and stored inside an operation_state_t.

The device examples currently use spmv_state_t, so I think we can leave this as-is until it gets changed, probably as part of my Intel GPU PR.

@yhmtsai
Copy link
Contributor Author

yhmtsai commented May 15, 2025

I think operation_state_t is the collection of several op_state_t mainly for easy usage for user in the end.
I do not think it is a main part we want to provide through the API. lt mixes the data for different operation and maybe confusing people if we would like to claim some reusable state. For example, operation_state_t can be reusable for spgemm(a, b, c) but which one of norm(a), norm(b), norm(c) can be reusable?

@yhmtsai yhmtsai merged commit 8e4ad01 into main May 15, 2025
12 checks passed
@yhmtsai yhmtsai deleted the dev/yhmtsai/cusparse_spmv branch May 15, 2025 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants