Initialize the taskflow dialect for multi-CGRA/spatial accelerator scenarios #232

ShangkunLi · 2026-01-05T06:05:11Z

In this pr, we complete the following things.

The Taskflow Dialect

We introduce the taskflow dialect, which contains the following ops to build a computation abstraction for both scale-out & scale-up spatial architectures:

The taskflow.graph op: wraps computation-intensive workloads into its region for multi-CGRA system acceleration.
The taskflow.task op: wraps a specific operation into its body for a single CGRA (with affine controller & tile array)
The taskflow.channel op: carries the data dependencies between two different tasks. We can further add resource binding attributes (e.g., streaming, sequential, coarse-grained pipeline) on this op to denote how we can transfer the data between two tasks that work along with the affine controller
The taskflow.drive op: carries the control dependencies between two different tasks. This is mainly used to partition some irregular workloads on multi-CGRA systems.
e.g.,

#set = affine_set<(d0, d1) : (d0 - 3 == 0, d1 - 7 == 0)>
module attributes {} {
  func.func @_Z21irregularLoopExample1v() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
    %c2_i32 = arith.constant 2 : i32
    %c8_i32 = arith.constant 8 : i32
    %c0_i32 = arith.constant 0 : i32
    %alloca = memref.alloca() : memref<i32>
    %alloca_0 = memref.alloca() : memref<4x8xi32>
    %0 = affine.for %arg0 = 0 to 5 iter_args(%arg1 = %c0_i32) -> (i32) {
      %2 = arith.index_cast %arg0 : index to i32
      %3 = arith.addi %arg1, %2 : i32
      affine.yield %3 : i32
    }
    // Loop 1: wrapped in task 1, and uses taskflow.drive to control task 2 & 3
    affine.for %arg0 = 0 to 4 { 
      %2 = arith.index_cast %arg0 : index to i32
      %3 = arith.muli %2, %c8_i32 : i32
      // Loop 2: wrapped in task 2, controlled by task 1
      affine.for %arg1 = 0 to 8 { 
        %4 = arith.index_cast %arg1 : index to i32
        %5 = arith.addi %3, %4 : i32
        affine.store %5, %alloca_0[%arg0, %arg1] : memref<4x8xi32>
      }
      // Loop 3: wrapped in task 3, controlled by task 1
      affine.for %arg1 = 0 to 8 {
        %4 = affine.load %alloca_0[%arg0, %arg1] : memref<4x8xi32>
        %5 = arith.addi %4, %0 : i32
        affine.if #set(%arg0, %arg1) {
          affine.store %5, %alloca[] : memref<i32>
          %6 = arith.muli %5, %c2_i32 : i32
          affine.store %6, %alloca[] : memref<i32>
        }
      }
    }
    %1 = affine.load %alloca[] : memref<i32>
    return %1 : i32
  }
}

We introduce a packet data type in taskflow dialect. This data type is carried by the taskflow.drive op and contains some metadata of each task (e.g., iteration space, task-level execution conditions).
The taskflow.task is the node of the taskflow.graph, while the taskflow.channel & taskflow.drive are the edges of the graph.

The `convert-linalg-to-taskflow` Pass

We initially realize a conversion pass to get the taskflow representation for a simple ResNet block generated by PyTorch.
The reason why I implement linalg-to-taskflow conversion is that for almost all ML workloads, we don't need to consider the control flows, as they only have inter-task data dependencies.

Features to Support

Compiler Level:

Use this dialect to represent the provided irregular workloads
Realize taskflow.task level fusion, to enable multi-kernels run on a CGRA

RTL Level:

Implement an affine controller in RTL repo, which needs further discussion

tools/mlir-taskflow-opt/mlir-taskflow-opt.cpp

ShangkunLi added 9 commits January 5, 2026 10:49

merge from main

d6dbb75

initialize taskflow dialect

91a36b5

Adjust the definition of taskflow operations

f62def7

create convert linalg to taskflow pass

25d075f

add channel insertion

7a64cb6

rename TaskFlow -> Taskflow

77ff373

add test for single resnet block

1fcf18a

update test

65c1450

enable numpy 1.x in main.yaml script

a05281d

tancheng requested review from BenkangPeng and HobbitQia January 5, 2026 06:52

[fix] fix typos in test script

63b03bf

ShangkunLi self-assigned this Jan 5, 2026

tancheng reviewed Jan 6, 2026

View reviewed changes

tools/mlir-taskflow-opt/mlir-taskflow-opt.cpp Outdated Show resolved Hide resolved

ShangkunLi requested a review from guosran January 6, 2026 04:24

consolidate two opt tools

c5de038

tancheng approved these changes Jan 6, 2026

View reviewed changes

ShangkunLi merged commit 68829b7 into coredac:main Jan 6, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initialize the taskflow dialect for multi-CGRA/spatial accelerator scenarios #232

Initialize the taskflow dialect for multi-CGRA/spatial accelerator scenarios #232

Uh oh!

ShangkunLi commented Jan 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Initialize the taskflow dialect for multi-CGRA/spatial accelerator scenarios #232

Initialize the taskflow dialect for multi-CGRA/spatial accelerator scenarios #232

Uh oh!

Conversation

ShangkunLi commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Taskflow Dialect

The convert-linalg-to-taskflow Pass

Features to Support

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShangkunLi commented Jan 5, 2026 •

edited

Loading

The `convert-linalg-to-taskflow` Pass