[onert] Implement buffer sharing optimization for BulkPipeline#16349
Conversation
This commit adds buffer sharing mechanism to reduce memory usage in bulk pipeline execution. Link models for async buffer preparation and optimize execution performance when models have identical program and weight sizes. ONE-DCO-1.0-Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
| BulkPipelineModel(const std::string &model_path, int device_id); | ||
| enum class BufferOwnership | ||
| { | ||
| OWNER, |
There was a problem hiding this comment.
Note for reviewers
BufferOwnership shows whther the buffer is shared (=BufferOwnerShip::SHARED) or maintained directly (=BufferOwnerShip::OWNER).
| } | ||
| } | ||
|
|
||
| void BulkPipelineManager::linkModels() |
There was a problem hiding this comment.
Note for reviewers
The linked models will share program and weight buffers.
For example, if we have 6 models will be executed in serial, then the linkage between models can be like,
- 6 models : [model_0, model_1, model_2, model_3, model_4, model_5]
- First linkage : model_0 -> model_2 -> model_4
- Seconde linkage : model_1 -> model_3 -> model_5
And model_0, model_1 will have BufferOwnership::OWNER buffers.
| if (model_idx++ < _config.n_owner_models) | ||
| { | ||
| // First n_shared_owner_models models are OWNERS | ||
| continue; |
There was a problem hiding this comment.
If _use_buffer_sharing becomes true, it seems that the first model and other models are all the same.
But if n_owner_models is 2, is it intended to have 2 owners?
I think only the first should be owner and the rest should be shared.
(If I misunderstood, please correct it. 😅 )
There was a problem hiding this comment.
Yes, the first 2 models are OWNER.
As I wrote in below (https://github.com/Samsung/ONE/pull/16349/files#r2710531814) model_0 and model_1 will have OWNER buffers and other models in the same linkage will share them.
There was a problem hiding this comment.
Thanks for reviewing ;)
| { | ||
| openDevice(); | ||
| allocateBuffers(); | ||
| fillBuffers(); |
There was a problem hiding this comment.
In fillBuffers(), _fp can be leaked if an exception is thrown.
Could you please add logic to check and close _fp before throwing an exception to avoid a file handle leak?
There was a problem hiding this comment.
_fp will be closed on release() method which is called in exception handler and destructor.
ONE/runtime/onert/backend/trix/ops/BulkPipelineModel.cc
Lines 86 to 90 in cdd7cb6
There was a problem hiding this comment.
Thanks for your information. I missed release().
But, there is no release() in startAsyncBufferFill(). Could you check it?
There was a problem hiding this comment.
IMHO, the failure in startAsyncBufferFill() should be handled in caller side.
In that case, release() can be called explictly by BulkPipelineManager which has BulkPipelineModel.
I'll update the error handling BulkPipelineManager code in another PR too.
| try | ||
| { | ||
| fillBuffers(); | ||
| markBufferReady(); | ||
| } | ||
| catch (const std::exception &e) | ||
| { | ||
| std::cerr << "Failed to fill buffers asynchronously: " << e.what() << std::endl; | ||
| } |
There was a problem hiding this comment.
_buffer_ready can leave permanently false if fillBuffers() throws before markBufferReady(). In other words, waitForBufferReady() may block forever.
There was a problem hiding this comment.
Yes, that's possibly right. If you don't mind I'll upload a separate PR to fix it.
There was a problem hiding this comment.
Thanks. I hope it gets fixed in a separate PR.
|
@ragmani |
This commit adds buffer sharing mechanism to reduce memory usage in bulk pipeline execution.
Link models for async buffer preparation and optimize execution performance when models have identical program and weight sizes.
ONE-DCO-1.0-Signed-off-by: Jonghwa Lee jonghwa3.lee@samsung.com