Conversation
|
@greptile |
Greptile OverviewGreptile SummaryThis PR implements bulk pipeline execution for ONERT's Trix backend, enabling multi-model orchestration with buffer sharing optimization. Key Changes:
Critical Issues Found:
Confidence Score: 1/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant TrixLoader
participant KernelGenerator
participant BulkPipelineLayer
participant BulkPipelineManager
participant BulkPipelineModel
participant BulkPipelineBuffer
participant NPU
User->>TrixLoader: loadFromFile
TrixLoader->>TrixLoader: Parse semicolon-delimited paths
TrixLoader->>TrixLoader: Load I/O metadata from head and tail models
TrixLoader->>TrixLoader: Create Bulk operation with vector of paths
User->>KernelGenerator: visit Bulk operation
alt Single Model
KernelGenerator->>KernelGenerator: Create BulkLayer
else Multiple Models
KernelGenerator->>BulkPipelineLayer: Create BulkPipelineLayer
end
BulkPipelineLayer->>BulkPipelineManager: configure and initialize
BulkPipelineManager->>BulkPipelineManager: createModels
loop For each model path
BulkPipelineManager->>BulkPipelineModel: new BulkPipelineModel
BulkPipelineModel->>BulkPipelineModel: loadMetadata
end
BulkPipelineManager->>BulkPipelineManager: Check buffer sharing eligibility
Note over BulkPipelineManager: All models must have same<br/>program_size and weight_size
BulkPipelineManager->>BulkPipelineManager: linkModels
Note over BulkPipelineManager: Set ownership OWNER vs SHARED<br/>Link models in chain
BulkPipelineManager->>BulkPipelineManager: prepareModels
loop For each OWNER model
BulkPipelineModel->>NPU: openDevice
BulkPipelineModel->>BulkPipelineBuffer: allocate program buffer
BulkPipelineModel->>BulkPipelineBuffer: allocate weight buffer
BulkPipelineBuffer->>NPU: ioctl TRIX_IOCTL_HWMEM_ALLOC
BulkPipelineBuffer->>BulkPipelineBuffer: mmap DMA buffer
BulkPipelineModel->>BulkPipelineBuffer: fillBuffers
BulkPipelineModel->>NPU: registerNPUmodel_ext
end
loop For each SHARED model
BulkPipelineModel->>BulkPipelineModel: shareBuffersFrom owner
Note over BulkPipelineModel: Reuse buffers device and model_id
end
User->>BulkPipelineLayer: run
BulkPipelineLayer->>BulkPipelineManager: execute with inputs and outputs
loop For each model in pipeline
BulkPipelineManager->>BulkPipelineModel: waitForBufferReady
Note over BulkPipelineModel: Wait on condition variable<br/>until buffer is ready
BulkPipelineModel->>NPU: runNPU_model
NPU-->>BulkPipelineModel: execution complete
alt Not last model
BulkPipelineManager->>BulkPipelineManager: Chain outputs to next inputs
end
alt Buffer sharing enabled
BulkPipelineModel->>BulkPipelineModel: getNextModel
BulkPipelineModel->>BulkPipelineModel: startAsyncBufferFill
Note over BulkPipelineModel: Async refill buffers for<br/>next shared neighbor
end
end
BulkPipelineManager-->>User: Pipeline execution complete
|
| bool BulkPipelineModel::initialize() | ||
| { | ||
| if (_initialized.load()) | ||
| { | ||
| return true; | ||
| } | ||
|
|
||
| if (!loadMetadata()) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| return true; | ||
| } |
There was a problem hiding this comment.
logic: _initialized flag never set to true in initialize() method.
| bool BulkPipelineModel::initialize() | |
| { | |
| if (_initialized.load()) | |
| { | |
| return true; | |
| } | |
| if (!loadMetadata()) | |
| { | |
| return false; | |
| } | |
| return true; | |
| } | |
| bool BulkPipelineModel::initialize() | |
| { | |
| if (_initialized.load()) | |
| { | |
| return true; | |
| } | |
| if (!loadMetadata()) | |
| { | |
| return false; | |
| } | |
| _initialized = true; | |
| return true; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: runtime/onert/backend/trix/ops/BulkPipelineModel.cc
Line: 41:54
Comment:
**logic:** `_initialized` flag never set to `true` in `initialize()` method.
```suggestion
bool BulkPipelineModel::initialize()
{
if (_initialized.load())
{
return true;
}
if (!loadMetadata())
{
return false;
}
_initialized = true;
return true;
}
```
How can I resolve this? If you propose a fix, please make it concise.| void BulkPipelineModel::startAsyncBufferFill() | ||
| { | ||
| _buffer_ready = false; | ||
| _async_fill_future = std::async(std::launch::async, [this] { | ||
| try | ||
| { | ||
| fillBuffers(); | ||
| markBufferReady(); | ||
| } | ||
| catch (const std::exception &e) | ||
| { | ||
| std::cerr << "Failed to fill buffers asynchronously: " << e.what() << std::endl; | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
logic: Race condition: _buffer_ready set to false before async task starts without synchronization
The _buffer_ready flag is set to false outside the mutex protection on line 206, then accessed inside the async lambda. If the async task starts immediately, it could read/write _buffer_ready concurrently with the main thread setting it, causing a data race.
| void BulkPipelineModel::startAsyncBufferFill() | |
| { | |
| _buffer_ready = false; | |
| _async_fill_future = std::async(std::launch::async, [this] { | |
| try | |
| { | |
| fillBuffers(); | |
| markBufferReady(); | |
| } | |
| catch (const std::exception &e) | |
| { | |
| std::cerr << "Failed to fill buffers asynchronously: " << e.what() << std::endl; | |
| } | |
| }); | |
| } | |
| void BulkPipelineModel::startAsyncBufferFill() | |
| { | |
| { | |
| std::lock_guard<std::mutex> lock(_buffer_mutex); | |
| _buffer_ready = false; | |
| } | |
| _async_fill_future = std::async(std::launch::async, [this] { | |
| try | |
| { | |
| fillBuffers(); | |
| markBufferReady(); | |
| } | |
| catch (const std::exception &e) | |
| { | |
| std::cerr << "Failed to fill buffers asynchronously: " << e.what() << std::endl; | |
| } | |
| }); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: runtime/onert/backend/trix/ops/BulkPipelineModel.cc
Line: 204:218
Comment:
**logic:** Race condition: `_buffer_ready` set to `false` before async task starts without synchronization
The `_buffer_ready` flag is set to `false` outside the mutex protection on line 206, then accessed inside the async lambda. If the async task starts immediately, it could read/write `_buffer_ready` concurrently with the main thread setting it, causing a data race.
```suggestion
void BulkPipelineModel::startAsyncBufferFill()
{
{
std::lock_guard<std::mutex> lock(_buffer_mutex);
_buffer_ready = false;
}
_async_fill_future = std::async(std::launch::async, [this] {
try
{
fillBuffers();
markBufferReady();
}
catch (const std::exception &e)
{
std::cerr << "Failed to fill buffers asynchronously: " << e.what() << std::endl;
}
});
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
syntax: Typo: "devbice" should be "device"
| // Open the devbice | |
| // Open the device |
Prompt To Fix With AI
This is a comment left during a code review.
Path: runtime/onert/backend/trix/ops/BulkPipelineBuffer.cc
Line: 71:71
Comment:
**syntax:** Typo: "devbice" should be "device"
```suggestion
// Open the device
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
syntax: Typo: "initilized" should be "initialized"
Prompt To Fix With AI
This is a comment left during a code review.
Path: runtime/onert/backend/trix/ops/BulkPipelineManager.cc
Line: 45:45
Comment:
**syntax:** Typo: "initilized" should be "initialized"
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
syntax: Typo: "Shareing" should be "Sharing"
Prompt To Fix With AI
This is a comment left during a code review.
Path: runtime/onert/backend/trix/ops/BulkPipelineModel.cc
Line: 175:175
Comment:
**syntax:** Typo: "Shareing" should be "Sharing"
How can I resolve this? If you propose a fix, please make it concise.| _buffer->dmabuf = ioctl(_dev_fd, TRIX_IOCTL_HWMEM_ALLOC, &hwmem); | ||
| if (_buffer->dmabuf < 0) | ||
| { | ||
| close(_dev_fd); | ||
| _dev_fd = -1; | ||
| throw std::runtime_error("Failed to allocate DMA buffer, size: " + std::to_string(hwmem.size)); | ||
| } |
There was a problem hiding this comment.
logic: Return value from ioctl() not checked correctly
The code assigns the return value of ioctl() to _buffer->dmabuf and checks if it's negative. However, ioctl() returns -1 on error (not the fd). The actual dmabuf fd should come from the hwmem.dbuf_fd field after a successful call.
| _buffer->dmabuf = ioctl(_dev_fd, TRIX_IOCTL_HWMEM_ALLOC, &hwmem); | |
| if (_buffer->dmabuf < 0) | |
| { | |
| close(_dev_fd); | |
| _dev_fd = -1; | |
| throw std::runtime_error("Failed to allocate DMA buffer, size: " + std::to_string(hwmem.size)); | |
| } | |
| int ret = ioctl(_dev_fd, TRIX_IOCTL_HWMEM_ALLOC, &hwmem); | |
| if (ret < 0) | |
| { | |
| close(_dev_fd); | |
| _dev_fd = -1; | |
| throw std::runtime_error("Failed to allocate DMA buffer, size: " + std::to_string(hwmem.size)); | |
| } | |
| _buffer->dmabuf = hwmem.dbuf_fd; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: runtime/onert/backend/trix/ops/BulkPipelineBuffer.cc
Line: 85:91
Comment:
**logic:** Return value from `ioctl()` not checked correctly
The code assigns the return value of `ioctl()` to `_buffer->dmabuf` and checks if it's negative. However, `ioctl()` returns `-1` on error (not the fd). The actual dmabuf fd should come from the `hwmem.dbuf_fd` field after a successful call.
```suggestion
int ret = ioctl(_dev_fd, TRIX_IOCTL_HWMEM_ALLOC, &hwmem);
if (ret < 0)
{
close(_dev_fd);
_dev_fd = -1;
throw std::runtime_error("Failed to allocate DMA buffer, size: " + std::to_string(hwmem.size));
}
_buffer->dmabuf = hwmem.dbuf_fd;
```
How can I resolve this? If you propose a fix, please make it concise.This commit introduces the BulkPipelineBuffer class to manage the buffers in the TRIX backend. ONt-DCO-1.0-ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
…sung#16334) This commits fixes the typing by allowing passing None as a value for the input_shapes parameter in the benchmark_inference() function. In case of input_shapes being None, the function uses the shape retrieved from the initialized session. ONE-DCO-1.0-Signed-off-by: Arkadiusz Bokowy <a.bokowy@samsung.com>
…sung#16333) It adds new MockSyscallsManager class to provide configurable hook system for mocking system calls in tests. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
This commit adds Mean_U8_000 to tolerance-based evaluation with 1 absolute tolerance to handle precision issues in uint8 operations. ONE-DCO-1.0-Signed-off-by: Hyeongseok Oh <hseok82.oh@samsung.com>
…6338) This commit adds debug print statements to log input data when model execution results differ between interpreter and luci outputs. This change helps diagnose test failures by providing complete context including input data, interpreter output, and luci output. ONE-DCO-1.0-Signed-off-by: Hyeongseok Oh <hseok82.oh@samsung.com>
…#16332) This implements new BulkPipelineModel class to handle NPU model loading. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
…g#16339) Add new BulkPipelineManager class to coordinate execution of multiple models in sequence with proper resource management. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
Samsung#16342) This commit updates header guard names and nested namespace declarations in bulk pipeline headers. This improves code consistency and readability. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
This commit adds comprehensive comments documenting current restrictions for MX dtypes (MXFP4, MXINT8) in the circle schema. ONE-DCO-1.0-Signed-off-by: Hyeongseok Oh <hseok82.oh@samsung.com>
…g#16343) It adds new BulkPipelineLayer class to handle bulk pipeline operations in the trix backend. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
This replaces the previous NYI exception with actual pipeline execution functionality for the trix backend. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
This commit adds buffer sharing mechanism to reduce memory usage in bulk pipeline execution. Link models for async buffer preparation and optimize execution performance when models have identical program and weight sizes. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
This commit adds verification step to ensure loaded models have matching input/output counts with the pipeline configuration. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
77b0f24 to
784f6ea
Compare
No description provided.