This project exhibits the implementation of a fast, high-performance BMM algorithm for sparse matrices. Blocking is introduced as a practical way of accelerating BMM and facilitating its parallelism, while MPI and OpenMP are used to distribute and parallelize the computations respectively.
Compile and run sequential version:
makeCompile and run parallel version:
make openmpCompile and run distributed version:
make mpiCompile and run hybrid version:
make hybrid* An OpenMP and MPI compatible compiler is required.
** Input datasets can be generated with generator.m and have to be placed in mtx/in folder.
*** In order for the tester to work, the result matrix C.mtx from generator.m have to be placed in mtx/out folder.