-
Notifications
You must be signed in to change notification settings - Fork 0
CUDA solver: First attempt at linear system solver for CUDA cores #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
saikubairkota
wants to merge
35
commits into
main
Choose a base branch
from
cuda-solver
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…implementation This is the first attempt at implementing a CUDA-based dense solver. The code doesn't compile yet since we didn't make the necessary modifications to the build system to use the CUDA compiler. We will make those changes in the next commit.
… checking for validity Removed the negative index comparison before adding the element to the stiffness matrix because the size_t type is unsigned anyways and will never be negative. So the comparison is redundant.
… compilation error
For some reason, when the CUDADenseSolver.cu is compiled with the nvcc compiler, it fails to compile with the error:
```
/usr/local/pyre/include/pyre/tensor/traits.h:30:18: error: expected identifier before ‘>’ token
30 | template <int N>
| ^
/usr/local/pyre/include/pyre/tensor/traits.h:41:18: error: expected identifier before ‘>’ token
41 | template <int N>
| ^
/usr/local/pyre/include/pyre/tensor/algebra.icc:1232:50: error: expected identifier before ‘>’ token
1232 | template <pyre::tensor::square_matrix_c matrixT>
| ^
```
Apparently, the nvcc compiler is not able to compile the some template code in the pyre library which we are using in the CUDADenseSolver.cu to defined the real type. This could be because nvcc doesn't support all the features of the C++ standard library. We hardcoded the mito::real to double for now to avoid this compilation error. We should figure out why this is happening and fix it later.
…DenseSolver instances Enabled APIs and factories to create CUDADenseSolver instances in the CUDA backend. We also corrected the CUDADenseSolver instantiation in the factories.h file.
…eric support Replaced the double data type with mito::real in the CUDADenseSolver to support generic types requested by the user. Note that the code now compiles only with nvcc compatibility changes in the pyre library as some C++20 functionalities such as lambda functions are not supported by the cuda compiler.
…ent data types Added the cusolver_traits struct to support double and float data type implementations in the CUDADenseSolver.
…ter in constructor Introduced the `SolverType` enum to distinguish between different solver types in the CUDA backend. Added solver type as a parameter in the `CUDADenseSolver` constructor so the user can choose the solver type at runtime.
… host We are storing the matrix in column-major order on the host since the cuSolver library expects matrices in column-major order and we could avoid doing a transpose later on the GPU.
…lving the linear system
Added checks with tolerances as the results were very close but not exactly equal which is fine for floating point operations.
… {mito} headers
This is achieved for example ensuring class {CUDADenseSolver} uses {double} instead of {mito::real} as the underlying {real_type}.
… {real_type}
This ensures that the solver can still be instantiated with a {mito::real} substitution, while keeping mito objects and cuda objects comiled separately, the formers with {gcc} and the latters with {nvcc}.
The template argument of class {CUDADenseSolver} is now required to be a type representing a real value.
…ds in header to introduce default arguments Added default arguments to the constructors and methods in `CUDADenseSolver` so we can add default arguments in the constructor and methods. There are also a few formatting edits in this commit.
… & methods common to all CUDA solvers
Made the CUDADenseSolver a derived class of CUDASolver to leverage common functionality and improve code reuse.
…memory more easily
…ice memory Similar to the HostArray class, we added a DeviceArray class to manage device memory for allocation and deallocation more easily.
…of Host, Device Arrays & moved few common members to base class We made multiple changes in CUDASolver and CUDADenseSolver classes in this commit: 1. We moved few common methods and attributes to the base class CUDASolver so we can reuse them in future for CUDASparseSolver class. 2. Used the HostArray and DeviceArray classes for managing host and device memory respectively. This change reduced a lot of code clutter related to memory management. 3. Removed a lot of unnecessary methods to free/initialize memory as the HostArray and DeviceArray classes automatically manage memory now. 4. A few other methods/attributes are also removed which are not needed.
…ce memory for sparse matrices
…r implementation This is the first attempt at implementing a sparse linear solver in CUDA. We used the Eigen sparse matrix to store the system matrix on the host side. The memory arrays we developed for the dense solver were reused to store the right-hand side and the solution vectors on the both host and device side. We used the cuDSS library to solve the linear system.
…tem tests for the CUDA sparse solver
…cannot be negative
…lve steps to ensure data integrity & correctness
…etric test to check if solver handles it
…ng all entries so we can test dense solver more thoroughly
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the first attempt at implementing a linear system solver for CUDA cores i.e., for solving linear systems on NVIDIA GPUs. We have implemented two types of solvers: CUDADenseSolver and CUDASparseSolver, each designed for the purpose their names suggest.
The CUDADenseSolver is developed based on the cuSolverDN library, while the CUDASparseSolver uses the cuDSS library. Both solvers support 32-bit and 64-bit floating-point data and rely on direct linear system solving methods (Cholesky or LU), which the user must choose depending on the matrix type.
Currently, the implementations are verified and tested with a single MPI process (CPU process) and a single GPU instance. For other configurations, such as multiple MPI processes with a single GPU or multiple GPUs, further modifications are needed to handle data exchange, load distribution, and ensure correct solver behavior.