spmv

Subroutines optimized for sparse matrix/vector multiplication under high memory latency

Compile the base spmv.c

gcc -O3 -fopenmp spmv.c -o spmv-base

Compile optimized version

gcc -O3 -ftree-vectorize -funroll-loops -fprefetch-loop-arrays -falign-functions=64 -falign-loops=64 -funroll-all-loops -fopenmp -march=znver4 -fopt-info-vec-optimized -mavx512f -fprefetch-loop-arrays --param prefetch-latency=300 spmv_j.c -g -o spmv-znver4-opt

Compile with calls to AMD AOCL Sparse library

Note: currently only works with AOCL v4.2. AOCL v5.0+ seem to have a bug.

gcc -O3 -ftree-vectorize -funroll-loops -fprefetch-loop-arrays -falign-functions=64 -falign-loops=64 -funroll-all-loops -fopenmp -march=znver4 -g spmv-aocl.c -I/<path to AOCL v4.2.0>/include -L/<path to AOCL v4.2.0>/lib -laoclsparse -lm

Notice to ensure correctness, you need to comment out the line 200, containing 'status = aoclsparse_optimize(A)'. This is because currently the optimization in AOCL (which includes matrix reordering) isn't performed properly. Future versions of AOCL will hopefully resolve this issue.

To run

OMP_NUM_THREADS=$(nproc) OMP_PROC_BIND=close <executable> HV15R/HV15R.mtx

nproc could be set to 24, or a multiple of 8. A script, `run_test.sh' is also available to run under linux perf for profiling.

To run MKL optimized code

Export environment variables and source library

export OMP_NUM_THREADS=24     
export MKL_NUM_THREADS=24
export KMP_AFFINITY=granularity=fine,compact  
export MKL_ENABLE_INSTRUCTIONS=AVX512 
export OMP_PROC_BIND=close 
source /opt/intel/oneapi/setvars.sh # assuming you have MKL installed

Compile code

gcc -O3 -fopenmp spmv_mkl.c -o spmv_mkl_exec \
  -I${MKLROOT}/include \
  -L${MKLROOT}/lib/intel64 \
  -Wl,--start-group \
  -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread \
  -liomp5 -lpthread -lm -ldl \
  -Wl,--end-group

Execute

numactl --cpunodebind=0 --membind=0 ./spmv_mkl_exec PR02R/PR02R.mtx

Input datasets

The input datasets can be downloaded from

https://sparse.tamu.edu/Fluorem/HV15R

and

https://sparse.tamu.edu/Fluorem/PR02R

Compiler

gcc v13.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
LICENSE		LICENSE
README.md		README.md
bw.py		bw.py
download_matrices.sh		download_matrices.sh
plot_sparse_matrix.py		plot_sparse_matrix.py
reorder-metis.py		reorder-metis.py
reorder-rcm.py		reorder-rcm.py
run-spmv.sh		run-spmv.sh
run_test.sh		run_test.sh
spmv-aocl.c		spmv-aocl.c
spmv-eigen.cpp		spmv-eigen.cpp
spmv-ellpack.c		spmv-ellpack.c
spmv-sellc.c		spmv-sellc.c
spmv-suitesparse.c		spmv-suitesparse.c
spmv.c		spmv.c
spmv_mkl.c		spmv_mkl.c
spmv_opt.c		spmv_opt.c
validate_mtx.c		validate_mtx.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spmv

Compile the base spmv.c

Compile optimized version

Compile with calls to AMD AOCL Sparse library

To run

To run MKL optimized code

Input datasets

Compiler

About

Uh oh!

Releases

Packages

Languages

License

arstgr/spmv

Folders and files

Latest commit

History

Repository files navigation

spmv

Compile the base spmv.c

Compile optimized version

Compile with calls to AMD AOCL Sparse library

To run

To run MKL optimized code

Input datasets

Compiler

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages