Skip to content

[LSQ][Verilog Backend] Incorrect generation of the LSQ in the Verilog backed #665

@Carmine50

Description

@Carmine50

The current LSQ generator for the Verilog backend (based on Chisel) does not work correctly.

When executing the kernel_2mm benchmark

void kernel_2mm(in_int_t alpha, in_int_t beta, inout_int_t tmp[NI][NJ],
                in_int_t A[NI][NK], in_int_t B[NK][NJ], in_int_t C[NK][NL],
                inout_int_t D[NI][NL]) {
  for (unsigned i = 0; i < NI; i++) {
    for (unsigned j = 0; j < NJ; j++) {
      tmp[i][j] = 0;
      for (unsigned k = 0; k < NK; ++k)
        tmp[i][j] += alpha * A[i][k] * B[k][j];
    }
  }

  for (unsigned i = 0; i < NI; i++) {
    for (unsigned l = 0; l < NL; l++) {
      D[i][l] *= beta;
      for (unsigned k = 0; k < NJ; ++k)
        D[i][l] += tmp[i][k] * C[k][l];
    }
  }
}

with the following script

set-dynamatic-path /home/user/dynamatic
set-src /home/user/dynamatic/integration-test/covariance/covariance.c
set-clock-period 5
compile --buffer-algorithm fpga20 --milp-solver gurobi
write-hdl --hdl vhdl
simulate
exit

the simulation fails. This is due to the incorrect order execution of load and store operations for the LSQ for memory D. In particular, in the second loop, there are the following memory operations:

for (unsigned l = 0; l < NL; l++) {
    Load D[i][l] (load1)
    Store D[i][l] (store1)
    for (unsigned k = 0; k < NJ; ++k){
      Load D[i][l] (load2)
      Store D[i][l] (store2)

The correct order of operations is load1 -> store1 -> load2 -> store2 when updating the value of l. However, the LSQ does not correctly enforce this order and execute these operations in this order load1 -> load2 -> store1 -> store2.

A possible assumption that causes this bug is the usage of loadOffset parameter in the LSQ Chisel generator. In particular, this loadOffset only ensures that all loads start at the same time. This is what actually happens in this case where load1 and load2 happen in order one after another. However, this parameter does not check when there must be interleaved dependencies (i.e., load->store->load).

On the other hand, this issue is solved in the VHDL generator where another parameter is used ldOrder. This parameter specifies for each load, after how many store operations it should execute.

As a temporary fix, we have replaced the Verilog generator with the VHDL one in #664 . However, this is just a temporary solution to fix the LSQ verilog generator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions