-
Notifications
You must be signed in to change notification settings - Fork 5
Description
At the latest commit da104fa, and fixed #4, there is still run-time error for examples/SpTRSV_runtime.cpp. The error occurs during the call to HDAGG::partialSparsification in examples/SpTRSV_runtime.h, in particular this assertion inside partialSparsification:
aggregation/src/hdagg/hdagg.cpp
Lines 2083 to 2086 in da104fa
| //The starting point of the next clique | |
| col = first + width; | |
| assert(col <= n); | |
| clique_ptr_per_thread.push_back(col); |
Error messages
With debug build, the error message is:
METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.25e-05
Running LL Levelset Code with #core: 20 - The runtime:1.58e-05
Hdagg_SpTRSV: /work_dir/aggregation/src/hdagg/hdagg.cpp:2085: void HDAGG::partialSparsification(int, int, const int*, const int*, std::vector<int>&, std::vector<int>&, bool): Assertion `col <= n' failed.
Aborted
The "LL Serial" and "LL Levelset" case work fine because they don't call partialSparsification. The "LL Tree + Levelset" and later cases fail at partialSparsification.
With release build it leads to memory errors, likely due to exceeding array bounds:
METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.24e-05
Running LL Levelset Code with #core: 20 - The runtime:1.46e-05
malloc_consolidate(): invalid chunk size
Aborted
The above are using the built-in random matrix generation. Using an external matrix like G2_circuit give a different error, but still memory-related.
Reading matrix /work_dir/matrix_data/G2_circuit/G2_circuit.mtx
METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:0.0009317
Running LL Levelset Code with #core: 2 - The runtime:0.0016047
Hdagg_SpTRSV: malloc.c:4118: _int_malloc: Assertion `chunk_main_arena (fwd)' failed.
Aborted
Steps to reproduce
git clone https://github.com/sympiler/aggregation
cd aggregation
# debug mode
cmake -DCMAKE_BUILD_TYPE=Debug -S . -B build_debug
cmake --build build_debug
./build_debug/example/Hdagg_SpTRSV # generates random matrice, same error when reading a mtx file
# release mode
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build_release
cmake --build build_release -j 4
./build_release/example/Hdagg_SpTRSV
# read external matrix file and use less threads than default
./build_release/example/Hdagg_SpTRSV /work_dir/matrix_data/G2_circuit/G2_circuit.mtx 2The system environment can be reproduced using this trivial Dockerfile that installs GCC 11.3 and CMake 3.22:
FROM ubuntu:22.04
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
git wget vim \
gcc g++ gfortran \
libnuma-dev \
libhwloc-dev \
libmetis-dev \
libomp-dev \
make cmake \
&& rm -rf /var/lib/apt/lists/*