-
Notifications
You must be signed in to change notification settings - Fork 8
Add AOCL-Sparse backend #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| private: | ||
| void release_matrix_handle(aoclsparse_matrix handle) { | ||
| if (handle != nullptr) { | ||
| aoclsparse_destroy(&handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to set handle to nullptr here after destruction ?
| const aoclsparse_int nnzA = a_base.rowptr().data()[a_nrows]; | ||
| const aoclsparse_int nnzB = b_base.rowptr().data()[b_nrows]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we assuming 0-based indexing here ? I guess you are setting it as such in line 72 , but might be worth adding -indexingA and -indexingB to these for completeness and future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I should have had it in the first place! Thanks for pointing it out.
| if (status != aoclsparse_status_success) { | ||
| fmt::print("\t csr matrix A creation failed\n"); | ||
| } | ||
| __aoclsparse::aoclsparse_create_csr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| __aoclsparse::aoclsparse_create_csr( | |
| status = __aoclsparse::aoclsparse_create_csr( |
| if (status != aoclsparse_status_success) { | ||
| fmt::print("\t csr matrix A creation failed\n"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a need to exit if there is a failure here instead of keeping going ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the thing to do here is throw an exception. (And also to probably clean up any temporary objects that have been created.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oneMKL SYCL backend will throw exceptions on failure, and that's the only failure mode, right? I think it's probably fine to let those exceptions go through to the user, although I guess we could also try to clean up and re-throw new spblas exceptions. (We haven't talked about specific exceptions to throw, but the standard thing to do is to come up with some inheriting from/based on the standard exception categories.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle failures instead of ignoring it. Also, it would be nice to have consistent handling across vendor backends. Re-throwing new spblas exceptions is a good idea, something to discuss in future meetings. As most of these functions return "void", returning a status is not possible as is, right?
| // Check: csrA and csrB destroyed when the operation_info destructor is | ||
| // called? | ||
|
|
||
| return operation_info_t{ | ||
| index<>{__backend::shape(c)[0], __backend::shape(c)[1]}, c_nnz, | ||
| __aoclsparse::operation_state_t{csrA, csrB, csrC}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so yes, currently the csrA and csrB that are created are being stored in your spgemm targeted but general named __aoclsparse::operation_state_t object ... and that operation_state_t object is being created on the fly and put into the operation_info_t which is being returned ... later on, we are going to have to sort out exactly when it makes sense to store csrA/csrB and or csrC in the info_t object versus when the matrix_opt object is passed in ... I started trying to untangle this a bit in another PR, but it is a little tricky ... :) For now this is fine, but it will require some modification in the future, including how to do things when the operation_info_t object is passed in to this function as well ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, yes, currently the operation_state_t object (inside the operation_info_t object) governs the lifetime of the stored csrA/csrB/csrC objects you are creating here ... That is fine for now, but with matrix_opt objects it gets more complicated :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
| const index_t ldx = x_base.extent(1); | ||
| const index_t ldy = y_base.extent(1); | ||
|
|
||
| const aoclsparse_int nnz = a_base.rowptr().data()[a_nrows]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, using -indexing here might be helpful for future when things switch a bit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
| const index_t a_ncols = __backend::shape(a_base)[1]; | ||
| const aoclsparse_int nnz = a_base.rowptr().data()[a_nrows]; | ||
|
|
||
| status = spblas::__aoclsparse::aoclsparse_create_csr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically you don't need thespblas:: part since you are already in spblas namespace here, but it is up to you whether to remove it or not :)
| __ranges::data(x_base), &beta, | ||
| __ranges::data(y)); | ||
| if (status != aoclsparse_status_success) { | ||
| fmt::print("\t SpMV failed: {}\n", (int) status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should happen when status is not success ? should we exit cleanly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have same issues in my mkliespblas and mklsycl impls ... just never noticed it until now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed in the other comment, we should make a clean exit with an appropriate message/status/exception.
spencerpatty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it looks really good. I added a few questions and suggestions to consider. Don't take them as required, but rather a conversation about preparing for potential futures. Approved for merge once you have addressed anything you feel might be important from the comments :)
This PR implements the spmv, spmm, spgemm, sptrsv functionality for CPU with AOCL Sparse library as backend for the CSR format.
|
@mshanthagit feel free to review comments and if you want to change anything, go for it :) otherwise, you can merge (or ask one of us to merge it as well! Nice first pass at adding AOCL Sparse backend ! |
|
@spencerpatty @BenBrock thanks for reviewing. I have incorporated the suggestions, will push soon. |
|
@spencerpatty @BenBrock I have addressed most of the comments. Looks like CI is failing. Can you please help me with the merge? |
yeah we have noticed that sometimes armpl CI fails to download the arm stack on first try... I retriggered it and it succeeded second time :) merging now |
spencerpatty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved again after review changes
|
Quick question—does this run all of the tests, or are you currently opting out of certain tests like rocSPARSE does? (It looks to me like you're running everything, which, if so, way to go!) |
|
@BenBrock we are not opting out of tests knowingly, but I know CSC related build fails locally. Also, all gtests were passing before the CSC commit. Are you referring to these tests? |
|
oh good point, it looks like we are only providing CSR input right now ... might need to add another PR with extension to CSC |
|
Gotcha—in that case you might want to check what parts of the reference backend are included, as it's possible some of the tests are actually using the reference backend (as far as I can tell the mixed CSC/CSR tests are not guarded anywhere). Eventually you'll probably want to add (Note that if you can create a CSC handle natively in AOCL, it's probably best to do that—oneMKL doesn't support CSC matrices, so we represent everything as CSRs, then check whether it's transposed [e.g. actually a CSC matrix].) |
|
Oh, I just noticed that this is not in the CI. It's probably actually still broken. My bad, we really need to make sure everything in the CI. I will take a look at adding that. |
|
yeah, I wondered about that actually, why we got a clean CI but hadn't added CSC support ... Now it makes sense. CSC isn't added to CI yet. |
|
AOCL-Sparse is not installable by script (the user needs to click to accept an EULA in the browser), so I asked Geri to install it at ICL. Let me know @mshanthagit if there's some other place I can download it with |
This PR is to add AOCL-Sparse vendor backend. However this code works till the following commit:
commit d868f46
Author: Benjamin Brock <brock@cs.berkeley.edu>
Date: Mon Mar 3 10:20:37 2025 -0500
But fails to build with the latest.
Edit: Ben noticed a small error that caused ARMPL and AOCLSPARSE to intersect and break. After fix, CI passes. Spencer will review content now that it is passing CI.