-
Notifications
You must be signed in to change notification settings - Fork 42
Description
The current LSQ generator for the Verilog backend (based on Chisel) does not work correctly.
When executing the kernel_2mm benchmark
void kernel_2mm(in_int_t alpha, in_int_t beta, inout_int_t tmp[NI][NJ],
in_int_t A[NI][NK], in_int_t B[NK][NJ], in_int_t C[NK][NL],
inout_int_t D[NI][NL]) {
for (unsigned i = 0; i < NI; i++) {
for (unsigned j = 0; j < NJ; j++) {
tmp[i][j] = 0;
for (unsigned k = 0; k < NK; ++k)
tmp[i][j] += alpha * A[i][k] * B[k][j];
}
}
for (unsigned i = 0; i < NI; i++) {
for (unsigned l = 0; l < NL; l++) {
D[i][l] *= beta;
for (unsigned k = 0; k < NJ; ++k)
D[i][l] += tmp[i][k] * C[k][l];
}
}
}
with the following script
set-dynamatic-path /home/user/dynamatic
set-src /home/user/dynamatic/integration-test/covariance/covariance.c
set-clock-period 5
compile --buffer-algorithm fpga20 --milp-solver gurobi
write-hdl --hdl vhdl
simulate
exit
the simulation fails. This is due to the incorrect order execution of load and store operations for the LSQ for memory D. In particular, in the second loop, there are the following memory operations:
for (unsigned l = 0; l < NL; l++) {
Load D[i][l] (load1)
Store D[i][l] (store1)
for (unsigned k = 0; k < NJ; ++k){
Load D[i][l] (load2)
Store D[i][l] (store2)
The correct order of operations is load1 -> store1 -> load2 -> store2 when updating the value of l. However, the LSQ does not correctly enforce this order and execute these operations in this order load1 -> load2 -> store1 -> store2.
A possible assumption that causes this bug is the usage of loadOffset parameter in the LSQ Chisel generator. In particular, this loadOffset only ensures that all loads start at the same time. This is what actually happens in this case where load1 and load2 happen in order one after another. However, this parameter does not check when there must be interleaved dependencies (i.e., load->store->load).
On the other hand, this issue is solved in the VHDL generator where another parameter is used ldOrder. This parameter specifies for each load, after how many store operations it should execute.
As a temporary fix, we have replaced the Verilog generator with the VHDL one in #664 . However, this is just a temporary solution to fix the LSQ verilog generator.