Skip to content

Conversation

@mshanthagit
Copy link
Contributor

@mshanthagit mshanthagit commented Mar 7, 2025

This PR is to add AOCL-Sparse vendor backend. However this code works till the following commit:
commit d868f46
Author: Benjamin Brock <brock@cs.berkeley.edu>
Date: Mon Mar 3 10:20:37 2025 -0500

But fails to build with the latest.

Edit: Ben noticed a small error that caused ARMPL and AOCLSPARSE to intersect and break. After fix, CI passes. Spencer will review content now that it is passing CI.

@spencerpatty spencerpatty self-requested a review March 14, 2025 22:06
private:
void release_matrix_handle(aoclsparse_matrix handle) {
if (handle != nullptr) {
aoclsparse_destroy(&handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to set handle to nullptr here after destruction ?

Comment on lines 79 to 80
const aoclsparse_int nnzA = a_base.rowptr().data()[a_nrows];
const aoclsparse_int nnzB = b_base.rowptr().data()[b_nrows];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we assuming 0-based indexing here ? I guess you are setting it as such in line 72 , but might be worth adding -indexingA and -indexingB to these for completeness and future...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I should have had it in the first place! Thanks for pointing it out.

if (status != aoclsparse_status_success) {
fmt::print("\t csr matrix A creation failed\n");
}
__aoclsparse::aoclsparse_create_csr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__aoclsparse::aoclsparse_create_csr(
status = __aoclsparse::aoclsparse_create_csr(

Comment on lines +85 to +87
if (status != aoclsparse_status_success) {
fmt::print("\t csr matrix A creation failed\n");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a need to exit if there is a failure here instead of keeping going ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the thing to do here is throw an exception. (And also to probably clean up any temporary objects that have been created.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oneMKL SYCL backend will throw exceptions on failure, and that's the only failure mode, right? I think it's probably fine to let those exceptions go through to the user, although I guess we could also try to clean up and re-throw new spblas exceptions. (We haven't talked about specific exceptions to throw, but the standard thing to do is to come up with some inheriting from/based on the standard exception categories.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle failures instead of ignoring it. Also, it would be nice to have consistent handling across vendor backends. Re-throwing new spblas exceptions is a good idea, something to discuss in future meetings. As most of these functions return "void", returning a status is not possible as is, right?

Comment on lines +108 to +113
// Check: csrA and csrB destroyed when the operation_info destructor is
// called?

return operation_info_t{
index<>{__backend::shape(c)[0], __backend::shape(c)[1]}, c_nnz,
__aoclsparse::operation_state_t{csrA, csrB, csrC}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so yes, currently the csrA and csrB that are created are being stored in your spgemm targeted but general named __aoclsparse::operation_state_t object ... and that operation_state_t object is being created on the fly and put into the operation_info_t which is being returned ... later on, we are going to have to sort out exactly when it makes sense to store csrA/csrB and or csrC in the info_t object versus when the matrix_opt object is passed in ... I started trying to untangle this a bit in another PR, but it is a little tricky ... :) For now this is fine, but it will require some modification in the future, including how to do things when the operation_info_t object is passed in to this function as well ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, yes, currently the operation_state_t object (inside the operation_info_t object) governs the lifetime of the stored csrA/csrB/csrC objects you are creating here ... That is fine for now, but with matrix_opt objects it gets more complicated :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

const index_t ldx = x_base.extent(1);
const index_t ldy = y_base.extent(1);

const aoclsparse_int nnz = a_base.rowptr().data()[a_nrows];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, using -indexing here might be helpful for future when things switch a bit ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

const index_t a_ncols = __backend::shape(a_base)[1];
const aoclsparse_int nnz = a_base.rowptr().data()[a_nrows];

status = spblas::__aoclsparse::aoclsparse_create_csr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically you don't need thespblas:: part since you are already in spblas namespace here, but it is up to you whether to remove it or not :)

__ranges::data(x_base), &beta,
__ranges::data(y));
if (status != aoclsparse_status_success) {
fmt::print("\t SpMV failed: {}\n", (int) status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should happen when status is not success ? should we exit cleanly ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have same issues in my mkliespblas and mklsycl impls ... just never noticed it until now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed in the other comment, we should make a clean exit with an appropriate message/status/exception.

spencerpatty
spencerpatty previously approved these changes Mar 17, 2025
Copy link
Contributor

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it looks really good. I added a few questions and suggestions to consider. Don't take them as required, but rather a conversation about preparing for potential futures. Approved for merge once you have addressed anything you feel might be important from the comments :)

This PR implements the spmv, spmm, spgemm, sptrsv functionality for CPU with AOCL Sparse library as backend for the CSR format.

@spencerpatty spencerpatty added the enhancement New feature or request label Mar 17, 2025
@spencerpatty
Copy link
Contributor

@mshanthagit feel free to review comments and if you want to change anything, go for it :) otherwise, you can merge (or ask one of us to merge it as well! Nice first pass at adding AOCL Sparse backend !

@mshanthagit
Copy link
Contributor Author

@spencerpatty @BenBrock thanks for reviewing. I have incorporated the suggestions, will push soon.

@mshanthagit
Copy link
Contributor Author

@spencerpatty @BenBrock I have addressed most of the comments. Looks like CI is failing. Can you please help me with the merge?

@spencerpatty
Copy link
Contributor

@spencerpatty @BenBrock I have addressed most of the comments. Looks like CI is failing. Can you please help me with the merge?

yeah we have noticed that sometimes armpl CI fails to download the arm stack on first try... I retriggered it and it succeeded second time :) merging now

Copy link
Contributor

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved again after review changes

@spencerpatty spencerpatty merged commit 85c65ec into main Mar 20, 2025
27 of 30 checks passed
@spencerpatty spencerpatty deleted the aoclsparse branch March 20, 2025 15:27
@BenBrock
Copy link
Collaborator

BenBrock commented Mar 20, 2025

Quick question—does this run all of the tests, or are you currently opting out of certain tests like rocSPARSE does? (It looks to me like you're running everything, which, if so, way to go!)

@mshanthagit
Copy link
Contributor Author

@BenBrock we are not opting out of tests knowingly, but I know CSC related build fails locally. Also, all gtests were passing before the CSC commit. Are you referring to these tests?

@spencerpatty
Copy link
Contributor

oh good point, it looks like we are only providing CSR input right now ... might need to add another PR with extension to CSC

@BenBrock
Copy link
Collaborator

Gotcha—in that case you might want to check what parts of the reference backend are included, as it's possible some of the tests are actually using the reference backend (as far as I can tell the mixed CSC/CSR tests are not guarded anywhere).

Eventually you'll probably want to add create_matrix_handle and get_transpose functions, which will handle mixed CSR/CSC.

(Note that if you can create a CSC handle natively in AOCL, it's probably best to do that—oneMKL doesn't support CSC matrices, so we represent everything as CSRs, then check whether it's transposed [e.g. actually a CSC matrix].)

@BenBrock
Copy link
Collaborator

Oh, I just noticed that this is not in the CI. It's probably actually still broken. My bad, we really need to make sure everything in the CI. I will take a look at adding that.

@spencerpatty
Copy link
Contributor

yeah, I wondered about that actually, why we got a clean CI but hadn't added CSC support ... Now it makes sense. CSC isn't added to CI yet.

@BenBrock
Copy link
Collaborator

AOCL-Sparse is not installable by script (the user needs to click to accept an EULA in the browser), so I asked Geri to install it at ICL. Let me know @mshanthagit if there's some other place I can download it with wget or a package manager, as it would be simpler to have it self-contained a GitHub-hosted action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants