Skip to content

Conversation

@BenBrock
Copy link
Collaborator

@BenBrock BenBrock commented Feb 26, 2025

Implement csc_view, which supports CSC, as well as transposed, which currently transposes between CSR <-> CSC.

This PR:

  • Implements csc_view.
  • Adds transposed, which transposes between CSR <-> CSC.
  • Adds test for CSC-only SpGEMM
  • Adds test for CSC sparse matrix SpMM, SpMV
  • Adds backend support for CSC to ArmPL, oneMKL.

@upsj upsj mentioned this pull request Feb 27, 2025
@BenBrock BenBrock force-pushed the dev/brock/implement-transposed branch from 3da8d45 to de12da2 Compare February 27, 2025 20:14
@BenBrock BenBrock marked this pull request as ready for review March 3, 2025 19:42
@BenBrock
Copy link
Collaborator Author

BenBrock commented Mar 3, 2025

There are a few important features missing from this PR:

  • Ability to multiply mixed CSR and CSC matrices. The MKL and ArmPL backends (I believe) will currently support this, but I still need to implement these in the reference backend.
  • Ability to use column-major dense matrices (as well as mixed column/row-major matrices) for SpMM. None of the backends currently allow this, but it's an important feature (particularly for MATLAB).
  • Integration with matrix_opt, which will come with Add matrix_opt into oneMKL_SYCL backend #39.

These will come in a follow-up PR. For now, I think merging in GPU implementations is more pressing before adding additional features.

@BenBrock BenBrock requested review from chrwarm and spencerpatty and removed request for spencerpatty March 3, 2025 19:46
Comment on lines +142 to +144
TEST(CscView, SpMV_Ascaled) {
using T = float;
using I = spblas::index_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to test SpMV_xscaled as well ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should write a new test to more exhaustively examine all combinations of matrices and views. I think we should allow the vendor backends to mature a little more first, though.


template <matrix M>
requires __detail::is_csr_view_v<M>
oneapi::mkl::sparse::matrix_handle_t create_matrix_handle(sycl::queue& q,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep calling this create_matrix_handle, thinking about future with matrix_opt, it is maybe better as create_or_retrieve_matrix_handle, which at that point, might just be more simply called get_matrix_handle ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to add get_matrix_handle once we merge your matrix_opt PR and the matrix_opt actually might have a matrix handle inside it.

Comment on lines +29 to +43
template <typename T>
armpl_status_t (*create_spmat_csc)(armpl_spmat_t*, armpl_int_t, armpl_int_t,
const armpl_int_t*, const armpl_int_t*,
const T*, armpl_int_t);
template <>
inline constexpr auto create_spmat_csc<float> = &armpl_spmat_create_csc_s;
template <>
inline constexpr auto create_spmat_csc<double> = &armpl_spmat_create_csc_d;
template <>
inline constexpr auto create_spmat_csc<std::complex<float>> =
&armpl_spmat_create_csc_c;
template <>
inline constexpr auto create_spmat_csc<std::complex<double>> =
&armpl_spmat_create_csc_z;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you prefer this kind of template specialization over the kind I have in https://github.com/SparseBLAS/spblas-reference/pull/23/files#diff-6ac61c9e281330753acb230d10581803329208dd22ef5d9fdb5492b845c8acf1R19-R42 for instance ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I actually prefer what you're doing there with IESparseBLAS. I think Chris originally wrote this code, but the two styles are pretty much equivalent.

template <typename M>
requires(__detail::is_csc_view_v<M>)
auto column(M&& m, typename std::remove_cvref_t<M>::index_type column_index) {
using O = typename std::remove_cvref_t<M>::offset_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to go through and update all the

  using T = tensor_scalar_t<A>;
  using I = tensor_index_t<A>;
  using O = tensor_offset_t<A>;

usages everywhere to include a std::remove_cvref_t<> call similar to what you are doing here ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually tensor_scalar_t<A>, etc. will automatically apply a std::remove_cvref_t. I think the only reason I don't use it here is include order.

Comment on lines +7 to +21
template <matrix M>
requires(__detail::is_csr_view_v<M>)
auto transposed(M&& m) {
return csc_view<tensor_scalar_t<M>, tensor_index_t<M>, tensor_offset_t<M>>(
m.values(), m.rowptr(), m.colind(), {m.shape()[1], m.shape()[0]},
m.size());
}

template <matrix M>
requires(__detail::is_csc_view_v<M>)
auto transposed(M&& m) {
return csr_view<tensor_scalar_t<M>, tensor_index_t<M>, tensor_offset_t<M>>(
m.values(), m.colptr(), m.rowind(), {m.shape()[1], m.shape()[0]},
m.size());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is where the relationship between CSR and CSC through transpose (not conjugate transpose though) comes into play with the switching of nRows/nCols and reinterpretting rowptr <-> colptr and rowind <-> colind ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had

csr_view<float> a(/* ... */);

// Explicit transpose class
transposed_view<csr_view<float>> a_t = transposed(a);

// Versus csc_view
csc_view<float> a_csc_t = transposed(a);

If a.shape() was (10, 20), wouldn't both a_t.shape() and a_csc_t.shape() both be (20, 10)? It seems to me the two should behave identically (except the explicit one has a .base() and the CSC one doesn't).

// y = A * (alpha * x)
multiply(a, scaled(2.f, x), y);

fmt::print("{}\n", spblas::__backend::values(y));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure we want to print out the 100x10 = 1000 values of y ? maybe we want to have a "print dense matrix" routine that prints the first 4 rows (first 4 columns, ... , last 4 columns) and then ... then last 4 rows (first 4 columns, ... , last 4 columns) in a nicely printed and tab aligned to commas view ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do similar thing with dense vector, only printing out first 4 elements, then last 4 elements. And also a similar thing for sparse matrix with printing first 4 rows and last 4 rows ...

maybe it could take in a parameter that represents block_size (default = 4) to be printed ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think writing a pretty printer and then using that in the examples would be nice. (We can implement a ostream / fmt formatter.) I think that probably belongs in another PR, though.

Copy link
Contributor

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk through some of my suggestions and questions posted in the review before I give full approval.

BenBrock and others added 4 commits March 3, 2025 18:53
Co-authored-by: Spencer Patty <spencer.patty@intel.com>
Co-authored-by: Spencer Patty <spencer.patty@intel.com>
@BenBrock BenBrock merged commit 62fdee3 into main Mar 6, 2025
9 checks passed
@BenBrock BenBrock deleted the dev/brock/implement-transposed branch March 6, 2025 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants