Skip to content

Conversation

@saikubairkota
Copy link
Collaborator

This is the first attempt at implementing a linear system solver for CUDA cores i.e., for solving linear systems on NVIDIA GPUs. We have implemented two types of solvers: CUDADenseSolver and CUDASparseSolver, each designed for the purpose their names suggest.

The CUDADenseSolver is developed based on the cuSolverDN library, while the CUDASparseSolver uses the cuDSS library. Both solvers support 32-bit and 64-bit floating-point data and rely on direct linear system solving methods (Cholesky or LU), which the user must choose depending on the matrix type.

Currently, the implementations are verified and tested with a single MPI process (CPU process) and a single GPU instance. For other configurations, such as multiple MPI processes with a single GPU or multiple GPUs, further modifications are needed to handle data exchange, load distribution, and ensure correct solver behavior.

saikubairkota and others added 30 commits June 6, 2025 18:19
…implementation

This is the first attempt at implementing a CUDA-based dense solver. The code doesn't compile yet since we didn't make the necessary modifications to the build system to use the CUDA compiler. We will make those changes in the next commit.
… checking for validity

Removed the negative index comparison before adding the element to the stiffness matrix because the size_t type is unsigned anyways and will never be negative. So the comparison is redundant.
… compilation error

For some reason, when the CUDADenseSolver.cu is compiled with the nvcc compiler, it fails to compile with the error:

```
/usr/local/pyre/include/pyre/tensor/traits.h:30:18: error: expected identifier before ‘>’ token
   30 |         template <int N>
      |                  ^
/usr/local/pyre/include/pyre/tensor/traits.h:41:18: error: expected identifier before ‘>’ token
   41 |         template <int N>
      |                  ^
/usr/local/pyre/include/pyre/tensor/algebra.icc:1232:50: error: expected identifier before ‘>’ token
 1232 | template <pyre::tensor::square_matrix_c matrixT>
      |                                                  ^
```

Apparently, the nvcc compiler is not able to compile the some template code in the pyre library which we are using in the CUDADenseSolver.cu to defined the real type. This could be because nvcc doesn't support all the features of the C++ standard library. We hardcoded the mito::real to double for now to avoid this compilation error. We should figure out why this is happening and fix it later.
…DenseSolver instances

Enabled APIs and factories to create CUDADenseSolver instances in the CUDA backend. We also corrected the CUDADenseSolver instantiation in the factories.h file.
…eric support

Replaced the double data type with mito::real in the CUDADenseSolver to support generic types requested by the user.

Note that the code now compiles only with nvcc compatibility changes in the pyre library as some C++20 functionalities such as lambda functions are not supported by the cuda compiler.
…ent data types

Added the cusolver_traits struct to support double and float data type implementations in the CUDADenseSolver.
…ter in constructor

Introduced the `SolverType` enum to distinguish between different solver types in the CUDA backend. Added solver type as a parameter in the `CUDADenseSolver` constructor so the user can choose the solver type at runtime.
… host

We are storing the matrix in column-major order on the host since the cuSolver library expects matrices in column-major order and we could avoid doing a transpose later on the GPU.
Added checks with tolerances as the results were very close but not exactly equal which is fine for floating point operations.
… {mito} headers

This is achieved for example ensuring class {CUDADenseSolver} uses {double} instead of {mito::real} as the underlying {real_type}.
… {real_type}

This ensures that the solver can still be instantiated with a {mito::real}  substitution, while keeping mito objects and cuda objects comiled separately, the formers with {gcc} and the latters with {nvcc}.
The template argument of class {CUDADenseSolver} is now required to be a type representing a real value.
…ds in header to introduce default arguments

Added default arguments to the constructors and methods in `CUDADenseSolver` so we can add default arguments in the constructor and methods. There are also a few formatting edits in this commit.
Made the CUDADenseSolver a derived class of CUDASolver to leverage common functionality and improve code reuse.
…ice memory

Similar to the HostArray class, we added a DeviceArray class to manage device memory for allocation and deallocation more easily.
…of Host, Device Arrays & moved few common members to base class

We made multiple changes in CUDASolver and CUDADenseSolver classes in this commit:

1. We moved few common methods and attributes to the base class CUDASolver so we can reuse them in future for CUDASparseSolver class.
2. Used the HostArray and DeviceArray classes for managing host and device memory respectively. This change reduced a lot of code clutter related to memory management.
3. Removed a lot of unnecessary methods to free/initialize memory as the HostArray and DeviceArray classes automatically manage memory now.
4. A few other methods/attributes are also removed which are not needed.
…r implementation

This is the first attempt at implementing a sparse linear solver in CUDA. We used the Eigen sparse matrix to store the system matrix on the host side. The memory arrays we developed for the dense solver were reused to store the right-hand side and the solution vectors on the both host and device side. We used the cuDSS library to solve the linear system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants