Skip to content

Assertion error inside HDAGG::partialSparsification for Hdagg_SpTRSV example #5

@learning-chip

Description

@learning-chip

At the latest commit da104fa, and fixed #4, there is still run-time error for examples/SpTRSV_runtime.cpp. The error occurs during the call to HDAGG::partialSparsification in examples/SpTRSV_runtime.h, in particular this assertion inside partialSparsification:

//The starting point of the next clique
col = first + width;
assert(col <= n);
clique_ptr_per_thread.push_back(col);

Error messages

With debug build, the error message is:

METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.25e-05
Running LL Levelset Code with #core: 20 - The runtime:1.58e-05
Hdagg_SpTRSV: /work_dir/aggregation/src/hdagg/hdagg.cpp:2085: void HDAGG::partialSparsification(int, int, const int*, const int*, std::vector<int>&, std::vector<int>&, bool): Assertion `col <= n' failed.
Aborted

The "LL Serial" and "LL Levelset" case work fine because they don't call partialSparsification. The "LL Tree + Levelset" and later cases fail at partialSparsification.

With release build it leads to memory errors, likely due to exceeding array bounds:

METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.24e-05
Running LL Levelset Code with #core: 20 - The runtime:1.46e-05
malloc_consolidate(): invalid chunk size
Aborted

The above are using the built-in random matrix generation. Using an external matrix like G2_circuit give a different error, but still memory-related.

Reading matrix /work_dir/matrix_data/G2_circuit/G2_circuit.mtx
METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:0.0009317
Running LL Levelset Code with #core: 2 - The runtime:0.0016047
Hdagg_SpTRSV: malloc.c:4118: _int_malloc: Assertion `chunk_main_arena (fwd)' failed.
Aborted

Steps to reproduce

git clone https://github.com/sympiler/aggregation
cd aggregation

# debug mode
cmake -DCMAKE_BUILD_TYPE=Debug -S . -B build_debug
cmake --build build_debug
./build_debug/example/Hdagg_SpTRSV  # generates random matrice, same error when reading a mtx file

# release mode
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build_release
cmake --build build_release -j 4
./build_release/example/Hdagg_SpTRSV
# read external matrix file and use less threads than default
./build_release/example/Hdagg_SpTRSV /work_dir/matrix_data/G2_circuit/G2_circuit.mtx 2

The system environment can be reproduced using this trivial Dockerfile that installs GCC 11.3 and CMake 3.22:

FROM ubuntu:22.04

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    git wget vim \
    gcc g++ gfortran \
    libnuma-dev \
    libhwloc-dev \
    libmetis-dev \
    libomp-dev \
    make cmake \
    && rm -rf /var/lib/apt/lists/*

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions